LSX-UniWue / SuperGLEBerLinks

German Language Understanding Evaluation Benchmark @NAACL24

☆11

Alternatives and similar repositories for SuperGLEBer

Users that are interested in SuperGLEBer are comparing it to the libraries listed below

Sorting:

malteos / clp-transfer
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning
☆30Updated 2 years ago
ClimSocAna / tecb-de
German Text Embedding Clustering Benchmark
☆17Updated last year
malteos / llm-datasets
A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.
☆59Updated 11 months ago
alexa / ramen
A software for transferring pre-trained English models to foreign languages
☆18Updated 2 years ago
konstantinjdobler / focus
[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"
☆32Updated last month
bjoernpl / GermanBenchmark
A repository containing the code for translating popular LLM benchmarks to German.
☆26Updated last year
CPJKU / wechsel
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.
☆82Updated 10 months ago
ZurichNLP / mbr
Minimum Bayes Risk Decoding for Hugging Face Transformers
☆58Updated last year
superlinear-ai / wtpsplit-lite
✂️ Sentence segmentation with wtpsplit's state-of-the-art Segment any Text (SaT) models
☆14Updated 2 weeks ago
zouharvi / tokenization-scorer
Simple-to-use scoring function for arbitrarily tokenized texts.
☆43Updated 5 months ago
ltgoslo / gpt-bert
Official implementation of "GPT or BERT: why not both?"
☆55Updated last month
tigerchen52 / GLADIS
GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)
☆17Updated last year
flairNLP / familiarity
Label shift estimation for transfer difficulty with Familiarity.
☆10Updated 5 months ago
aiintelligentsystems / next-level-bert
☆15Updated last year
thakur-nandan / income
INCOME: An Easy Repository for Training and Evaluation of Index Compression Methods in Dense Retrieval. Includes BPR and JPQ.
☆24Updated last year
MilaNLProc / simple-generation
A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.
☆28Updated 10 months ago
ottowg / gsap-ner
☆10Updated 9 months ago
lm-pub-quiz / lm-pub-quiz
Evaluate language models using multiple choice items
☆13Updated 2 months ago
huggingface / that_is_good_data
☆66Updated last year
EuroEval / EuroEval
The robust European language model benchmark.
☆111Updated this week
cshaib / diversity
☆19Updated last month
huggingface / olm-training
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆93Updated 2 years ago
mainlp / germanic-lrl-corpora
A survey of corpora for Germanic low-resource languages and dialects
☆25Updated 7 months ago
worldbank / GISTEmbed
GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings
☆43Updated last year
gautierdag / tokenizer-bench
Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"
☆19Updated last year
danielvarab / massive-summ
☆30Updated 2 years ago
helpmefindaname / transformer-smaller-training-vocab
Temporary remove unused tokens during training to save ram and speed.
☆24Updated last month
ielab / Starbucks
Starbucks: Improved Training for 2D Matryoshka Embeddings
☆21Updated 3 weeks ago
MadryLab / AT2
Attribute statements generated by LLMs to preceding tokens using attention weights.
☆15Updated 2 months ago
shayne-longpre / a-pretrainers-guide
☆72Updated 2 years ago