OpenGPTX / lm-evaluation-harnessLinks
A framework for few-shot evaluation of autoregressive language models.
☆12Updated 6 months ago
Alternatives and similar repositories for lm-evaluation-harness
Users that are interested in lm-evaluation-harness are comparing it to the libraries listed below
Sorting:
- ☆20Updated last month
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆15Updated last year
- Aioli: A unified optimization framework for language model data mixing☆32Updated last year
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Updated last year
- Source code for Activated LoRA☆23Updated 2 months ago
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆66Updated 2 years ago
- UQ: Assessing Language Models on Unsolved Questions☆30Updated 5 months ago
- Minimum Description Length probing for neural network representations☆20Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- ☆55Updated last year
- Tasks and tutorials using Graphore's IPU with Hugging Face. Originally at https://github.com/gradient-ai/Graphcore-HuggingFace☆17Updated last year
- Advanced Reasoning Benchmark Dataset for LLMs☆47Updated 2 years ago
- GoldFinch and other hybrid transformer components☆45Updated last year
- Entailment self-training☆26Updated 2 years ago
- Train, tune, and infer Bamba model☆138Updated 7 months ago
- ☆38Updated 5 months ago
- ☆26Updated 2 years ago
- ☆91Updated last month
- The application is a end-user training and evaluation system for standard knowledge graph embedding models. It was developed to optimise …☆18Updated 8 months ago
- Fork of Flame repo for training of some new stuff in development☆19Updated 3 weeks ago
- Source-to-Source Debuggable Derivatives in Pure Python☆15Updated 2 years ago
- A repository for research on medium sized language models.☆77Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated 2 years ago
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆40Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆45Updated 3 months ago
- ☆18Updated last year
- ☆28Updated 9 months ago
- ☆23Updated 2 years ago
- Track the progress of LLM context utilisation☆55Updated 9 months ago
- [NeurIPS'24 LanGame workshop] On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆41Updated 6 months ago