huggingface / lm-evaluation-harness
A framework for few-shot evaluation of language models.
☆32Updated 2 months ago
Alternatives and similar repositories for lm-evaluation-harness
Users that are interested in lm-evaluation-harness are comparing it to the libraries listed below
Sorting:
- Verifiers for LLM Reinforcement Learning☆50Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 7 months ago
- ☆48Updated 6 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 8 months ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆121Updated last year
- ☆120Updated 7 months ago
- Complex Function Calling Benchmark.☆100Updated 3 months ago
- Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".☆132Updated 2 weeks ago
- ☆47Updated 8 months ago
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆122Updated 9 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- ☆38Updated 10 months ago
- ☆114Updated 2 months ago
- ☆120Updated last month
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆110Updated 8 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆61Updated last year
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆72Updated 9 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆77Updated last year
- This is the official repository for Inheritune.☆111Updated 3 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆84Updated 5 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆140Updated 6 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆96Updated last year
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- ☆43Updated 3 months ago
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆76Updated 6 months ago
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆58Updated last month
- ☆45Updated 9 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆217Updated 6 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆135Updated 6 months ago
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆54Updated 7 months ago