allenai / super-benchmarkLinks

☆49

Alternatives and similar repositories for super-benchmark

Users that are interested in super-benchmark are comparing it to the libraries listed below

Sorting:

r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆43Updated last month
GAIR-NLP / scaleeval
Scalable Meta-Evaluation of LLMs as Evaluators
☆42Updated last year
HazyResearch / aioli
Aioli: A unified optimization framework for language model data mixing
☆28Updated 9 months ago
kaistAI / Janus
[NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages
☆51Updated 3 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
oriyor / assistantbench
Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"
☆63Updated 11 months ago
clinicalml / co-llm
Co-LLM: Learning to Decode Collaboratively with Multiple Language Models
☆122Updated last year
martin-wey / CodeUltraFeedback
CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)
☆72Updated last year
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆48Updated 8 months ago
abhika-m / FAVA
☆74Updated last year
GAIR-NLP / benbench
Benchmarking Benchmark Leakage in Large Language Models
☆56Updated last year
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆82Updated last year
sher222 / LeReT
Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
☆51Updated last year
choosewhatulike / case2code
☆17Updated 7 months ago
bigcode-project / astraios
Astraios: Parameter-Efficient Instruction Tuning Code Language Models
☆62Updated last year
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆90Updated 11 months ago
kaistAI / factual-knowledge-acquisition
☆23Updated 2 weeks ago
ahans30 / goldfish-loss
[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs
☆92Updated last year
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆91Updated last year
hamishivi / EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆75Updated last year
limenlp / safer-instruct
This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"
☆17Updated last year
para-lost / ReBase
ReBase: Training Task Experts through Retrieval Based Distillation
☆29Updated 9 months ago
ScalerLab / JudgeBench
☆103Updated last year
architsharma97 / dpo-rlaif
☆100Updated last year
hkust-nlp / B-STaR
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆86Updated 5 months ago
JHU-CLSP / RATIONALYST
Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044
☆35Updated last year
voidism / Lookback-Lens
Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"
☆141Updated last month
scandukuri / assistant-gate
☆29Updated last year
nyu-mll / ILF-for-code-generation
☆80Updated 7 months ago