ClimSocAna / tecb-de
German Text Embedding Clustering Benchmark
☆15Updated 10 months ago
Alternatives and similar repositories for tecb-de:
Users that are interested in tecb-de are comparing it to the libraries listed below
- German Alpaca Dataset (Cleaned + Translated)☆24Updated last year
- A software for transferring pre-trained English models to foreign languages☆18Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆56Updated 6 months ago
- Evaluate language models using multiple choice items☆12Updated 3 weeks ago
- ☆26Updated 5 months ago
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆29Updated 2 years ago
- Curriculum training☆16Updated 2 weeks ago
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆70Updated 10 months ago
- GC4LM: A Colossal (Biased) language model for German☆13Updated 3 years ago
- Semantically Structured Sentence Embeddings☆66Updated 3 months ago
- A library of translation-based text similarity measures☆25Updated last year
- ☆23Updated last month
- Framework for unified summarisation and evaluation of English documents using state-of-the-art models and measures.☆31Updated 8 months ago
- ☆16Updated 2 years ago
- A spaCy custom component that extracts and normalizes temporal expressions☆52Updated last year
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆28Updated 2 years ago
- GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)☆13Updated 7 months ago
- ☆10Updated 5 years ago
- A survey of corpora for Germanic low-resource languages and dialects☆24Updated last month
- Automatically detect errors in annotated corpora.☆47Updated last year
- A Python Commonsense Knowledge Inference Toolkit☆63Updated last year
- ☆12Updated 4 months ago
- Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/☆83Updated 3 weeks ago
- ☆12Updated 3 years ago
- Python source code for EMNLP 2021 Findings paper: "Subword Mapping and Anchoring Across Languages".☆13Updated 3 years ago
- Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies regard classification and bias mitigation triggers.☆14Updated 4 months ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆99Updated 9 months ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆26Updated 3 years ago
- ☆79Updated last month
- GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings☆37Updated 10 months ago