allenai / S2APLERLinks
S2APLER: S2 Agglomeration of Papers with Low Error Rate (it's for academic paper clustering)
☆21Updated 2 months ago
Alternatives and similar repositories for S2APLER
Users that are interested in S2APLER are comparing it to the libraries listed below
Sorting:
- Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)☆74Updated last month
- MultiCite code and data. Models are available on Huggingface.☆32Updated 3 years ago
- ☆116Updated 3 months ago
- A Test Collection of Computer Science Papers for Faceted Query by Example☆22Updated 4 years ago
- ☆59Updated 4 years ago
- Dataset accompanying the SPECTER model☆142Updated 3 years ago
- Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.☆54Updated 2 years ago
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Updated 3 years ago
- ☆38Updated 3 years ago
- Simple Questions Generate Named Entity Recognition Datasets (EMNLP 2022)☆76Updated 2 years ago
- Multidocument Summarization for Literature Review Shared Task 2022☆30Updated 3 years ago
- Cross language information retrieval pipeline☆19Updated 2 weeks ago
- ☆46Updated 3 years ago
- multimodal document analysis☆166Updated 2 months ago
- Dense hybrid representations for text retrieval☆64Updated 2 years ago
- The autoregressive information extraction system GenIE (Generative Information Extraction) implemented in PyTorch.☆104Updated 2 years ago
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆297Updated last year
- This is the code for our KILT leaderboard submissions (KGI + Re2G models).☆157Updated 4 months ago
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆79Updated 3 years ago
- An official repository for MIA 2022 (NAACL 2022 Workshop) Shared Task on Cross-lingual Open-Retrieval Question Answering.☆31Updated 3 years ago
- PropSegmEnt is an annotated dataset for segmenting English text into propositions, and recognizing proposition-level entailment relations…☆21Updated 3 years ago
- A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents☆28Updated 3 years ago
- Retrieval-Augmented Generation battle!☆62Updated 6 months ago
- Long-context pretrained encoder-decoder models☆96Updated 3 years ago
- Dataset from the paper "Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering" (COLING 2022)☆118Updated 3 years ago
- SciGen☆24Updated 4 years ago
- The dataset and code for ACL 2022 paper "SciNLI: A Corpus for Natural Language Inference on Scientific Text" are released here.☆28Updated 2 years ago
- Submission archive for the MS MARCO passage ranking leaderboard☆13Updated 2 years ago
- A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution☆35Updated 2 years ago
- ☆11Updated 3 years ago