GitXplorerGitXplorer
y

altegrad_challenge

public
3 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
90939d3a6ae7c6c80f121679d1c2499014f96a43

Create LICENSE

cclementapa committed 3 years ago
Unverified
9482c5ce97150ae4af598fd236f3b2bbe50d2d2c

update slides

cclementapa committed 3 years ago
Unverified
0f36939c355b7fed508bdfdd0174989abc2e939d

update readme

cclementapa committed 3 years ago
Unverified
be4969cb37ec16e2f24c6d30842aff8b9ff065c2

correct links

cclementapa committed 3 years ago
Unverified
b39ab9ef9e2aaa7fd20aa167e946129c613e93b0

add report slides and update readme

cclementapa committed 3 years ago
Unverified
19d120e3192df3332865682d33bba4b457a60567

update readme

cclementapa committed 3 years ago

README

The README file for this repository.

Altegrad 2021-2022 - Citation Prediction Challenge

Authors: Apavou Clément & Belkada Younes & Zucker Arthur

Python PyTorch PyTorch Lightning

The kaggle challenge is the following : https://www.kaggle.com/c/altegrad-2021/leaderboard

🔎 Introduction

In this challenge, we are given a large scientific citation graph, with each node corresponding to a certain article. The dataset consists of 138 499 vertices i.e articles, with their associated abstract and list of authors. The goal is to be able to predict whether two nodes are citing each other, given all this information. In the next sections, we will try to elaborate on the various intuitions behind our approaches, and present the obtained results as well as some possible interpretations for each observations. The provided code corresponds to the code that we have used for the best model (i.e the right commit ).

🔨 Getting started

pip3 install requirements.txt

Then,

sh download_data.sh
python3 main.py

📍 Tips

The best model can be used using the best-model branch, as it does not use this implementation of the code. This branch is the final code as it allows customization of the various embeddings and corresponds to the latest version of the code.

🔎 Results

Model loss validation loss test (private leaderboard) Run
Best model 0.07775 0.07939

All experiments are available on wandb:

♦️ Best MLP architecture

Architecture

📎 Presentation of our work

Report & Slides

🔧 Some tools used

logo hugging face logo Keybert

Some citations

@misc{cohan2020specter,
      title={SPECTER: Document-level Representation Learning using Citation-informed Transformers}, 
      author={Arman Cohan and Sergey Feldman and Iz Beltagy and Doug Downey and Daniel S. Weld},
      year={2020},
      eprint={2004.07180},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}