lunary-ai / llm-benchmarksLinks
LLM benchmarks
☆13Updated last year
Alternatives and similar repositories for llm-benchmarks
Users that are interested in llm-benchmarks are comparing it to the libraries listed below
Sorting:
- Reasoning by Communicating with Agents☆28Updated last month
- ☆21Updated last year
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- Public Inflection Benchmarks☆68Updated last year
- LLMs as Collaboratively Edited Knowledge Bases☆45Updated last year
- Score LLM pretraining data with classifiers☆55Updated last year
- Implementation of Spectral State Space Models☆16Updated last year
- ☆49Updated 7 months ago
- Minimum Description Length probing for neural network representations☆19Updated 4 months ago
- Testing paligemma2 finetuning on reasoning dataset☆18Updated 5 months ago
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆60Updated 2 months ago
- Based on the tree of thoughts paper☆48Updated last year
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆40Updated last year
- Code repo for MathAgent☆16Updated last year
- ☆22Updated last year
- ☆19Updated 2 weeks ago
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning☆35Updated last year
- Advanced Reasoning Benchmark Dataset for LLMs☆46Updated last year
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Updated 7 months ago
- ☆34Updated 11 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated last year
- Google Research☆46Updated 2 years ago
- Small, simple agent task environments for training and evaluation☆18Updated 7 months ago
- Track the progress of LLM context utilisation☆54Updated last month
- Repository for the code and dataset for the paper: "Have LLMs Advanced enough? Towards Harder Problem Solving Benchmarks For Large Langu…☆39Updated last year
- ☆22Updated last month
- Make triton easier☆47Updated 11 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- Image Diffusion block merging technique applied to transformers based Language Models.☆54Updated 2 years ago