allenai / S2APLERLinks
S2APLER: S2 Agglomeration of Papers with Low Error Rate (it's for academic paper clustering)
☆17Updated last year
Alternatives and similar repositories for S2APLER
Users that are interested in S2APLER are comparing it to the libraries listed below
Sorting:
- Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)☆71Updated 2 years ago
- MultiCite code and data. Models are available on Huggingface.☆31Updated 3 years ago
- Dataset accompanying the SPECTER model☆137Updated 2 years ago
- ☆93Updated last year
- SciGen☆24Updated 3 years ago
- ☆18Updated 2 years ago
- SciRepEval benchmark training and evaluation scripts☆75Updated last year
- ☆37Updated 2 years ago
- Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.☆54Updated last year
- ☆53Updated 3 years ago
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Updated 2 years ago
- A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents☆25Updated 2 years ago
- Multidocument Summarization for Literature Review Shared Task 2022☆30Updated 2 years ago
- Simple Questions Generate Named Entity Recognition Datasets (EMNLP 2022)☆76Updated 2 years ago
- multimodal document analysis☆165Updated last year
- This is the code for our KILT leaderboard submissions (KGI + Re2G models).☆156Updated 3 months ago
- Pretraining Efficiently on S2ORC!☆165Updated 9 months ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆179Updated 2 years ago
- ☆91Updated 3 years ago
- Dense hybrid representations for text retrieval☆63Updated 2 years ago
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆78Updated 3 years ago
- Measuring the Evolution of a Scientific Field through Citation Frames☆59Updated 6 years ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆59Updated last year
- Data and models for the SciFact verification task.☆238Updated last year
- A Test Collection of Computer Science Papers for Faceted Query by Example☆21Updated 3 years ago
- A set of Python scripts for preprocessing the Wikidata JSON dump and running simple queries in an efficient manner.☆126Updated 9 months ago
- GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings☆43Updated last year
- PyTorch implementation and pre-trained models for ASP - Autoregressive Structured Prediction with Language Models, EMNLP 22. https://arxi…☆106Updated last year
- Cross language information retrieval pipeline☆18Updated 2 years ago
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆292Updated 10 months ago