bjoernpl / GermanBenchmarkLinks
A repository containing the code for translating popular LLM benchmarks to German.
☆25Updated last year
Alternatives and similar repositories for GermanBenchmark
Users that are interested in GermanBenchmark are comparing it to the libraries listed below
Sorting:
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆58Updated last year
- A framework for few-shot evaluation of autoregressive language models.☆13Updated last year
- Code for Zero-Shot Tokenizer Transfer☆128Updated 4 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆128Updated last year
- ☆38Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆59Updated 10 months ago
- Prune transformer layers☆69Updated last year
- Evaluation pipeline for the BabyLM Challenge 2023.☆75Updated last year
- How do transformer LMs encode relations?☆48Updated last year
- Manage scalable open LLM inference endpoints in Slurm clusters☆258Updated 10 months ago
- ☆75Updated 3 months ago
- ☆72Updated 2 years ago
- ☆121Updated 8 months ago
- Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State☆18Updated last year
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback☆96Updated last year
- Experiments for efforts to train a new and improved t5☆77Updated last year
- Simple and scalable tools for data-driven pretraining data selection.☆24Updated 3 months ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- Official implementation of "GPT or BERT: why not both?"☆53Updated 2 months ago
- ☆74Updated last year
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆101Updated last year
- ☆45Updated 4 months ago
- ☆65Updated last year
- Official Code for M-RᴇᴡᴀʀᴅBᴇɴᴄʜ: Evaluating Reward Models in Multilingual Settings (ACL 2025 Main)☆28Updated 3 weeks ago
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆43Updated 6 months ago
- Functional Benchmarks and the Reasoning Gap☆86Updated 8 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆75Updated 9 months ago
- ☆13Updated 2 weeks ago
- A package dedicated for running benchmark agreement testing☆16Updated 3 weeks ago
- Code for the ACL 2023 paper: "Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Sc…☆30Updated last year