VILA-Lab / Open-LLM-LeaderboardLinks

Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545

☆45

Alternatives and similar repositories for Open-LLM-Leaderboard

Users that are interested in Open-LLM-Leaderboard are comparing it to the libraries listed below

Sorting:

kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆52Updated 4 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆98Updated last month
open-compass / GPassK
[ACL 2025] Are Your LLMs Capable of Stable Reasoning?
☆25Updated 2 months ago
qiuzh20 / EMoE
Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]
☆31Updated last year
JayZhang42 / SLED
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433
☆25Updated 6 months ago
MingLiiii / Layer_Gradient
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
☆64Updated 3 months ago
Dereck0602 / Awesome_Test_Time_LLMs
☆105Updated 2 months ago
DAMO-NLP-SG / LongPO
[ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
☆37Updated 3 months ago
THU-KEG / AdaptThink
☆100Updated last week
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆56Updated 8 months ago
Quehry / HelloBench
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
☆45Updated 6 months ago
tatsu-lab / test_set_contamination
☆37Updated last year
waltonfuture / Diff-eRank
[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models
☆46Updated last week
zitian-gao / SC-MCTS
Interpretable Contrastive Monte Carlo Tree Search Reasoning
☆48Updated 7 months ago
GeniusHTX / TALE
☆107Updated 2 weeks ago
starrYYxuan / LeCo
This the implementation of LeCo
☆31Updated 4 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆70Updated 2 months ago
THUDM / T1
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
☆104Updated 4 months ago
cxcscmu / MATES
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
☆70Updated 6 months ago
princeton-pli / MeCo
Code for preprint "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"
☆39Updated last month
mathllm / MathCoder2
☆61Updated 7 months ago
Infini-AI-Lab / S2FT
☆17Updated 5 months ago
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆34Updated 8 months ago
yyDing1 / ScaleQuest
[ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
☆63Updated 7 months ago
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆63Updated last week
zjunlp / KnowledgeCircuits
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
☆148Updated 3 months ago
Shwai-He / MEO
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
☆38Updated last year
JasonForJoy / Model-Editing-Hurt
EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
☆35Updated 2 weeks ago
princeton-nlp / CharXiv
[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
☆115Updated last month
ZHZisZZ / weak-to-strong-search
[NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
☆61Updated 5 months ago