maastrichtlawtech / bsardLinks
π A statutory article retrieval dataset in French. (ACL 2022)
β40Updated last year
Alternatives and similar repositories for bsard
Users that are interested in bsard are comparing it to the libraries listed below
Sorting:
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.β76Updated 3 years ago
- GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddingsβ43Updated last year
- β86Updated 2 months ago
- A multilingual version of MS MARCO passage ranking datasetβ145Updated last year
- Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.β53Updated last year
- Inquisitive Parrots for Searchβ193Updated 3 weeks ago
- A multi-purpose toolkit for table-to-text generation: web interface, Python bindings, CLI commands.β55Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.β93Updated 2 years ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' puβ¦β40Updated 3 years ago
- Semantically Structured Sentence Embeddingsβ66Updated 8 months ago
- Efficient Attention for Long Sequence Processingβ94Updated last year
- Using business-level retrieval system (BM25) with Python in just a few lines.β31Updated 2 years ago
- β54Updated 2 years ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laβ¦β48Updated last year
- πΈοΈ A graph-augmented dense statute retriever. (EACL 2023)β21Updated last year
- Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.β97Updated last year
- The autoregressive information extraction system GenIE (Generative Information Extraction) implemented in PyTorch.β104Updated 2 years ago
- β47Updated 3 years ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrievalβ29Updated 2 years ago
- The official code for PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarizationβ156Updated 2 years ago
- CLIR version of ColBERTβ68Updated this week
- Dataset for NAACL 2021 paper: "QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization"β126Updated last year
- Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.β109Updated 3 months ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023β101Updated last year
- Code, datasets, and checkpoints for the paper "Improving Passage Retrieval with Zero-Shot Question Generation (EMNLP 2022)"β101Updated 2 years ago
- Ensembling Hugging Face transformers made easyβ63Updated 2 years ago
- β43Updated 2 years ago
- β100Updated 2 years ago
- Automatically detect errors in annotated corpora.β47Updated last year
- multimodal document analysisβ165Updated last year