felipemaiapolo / tinyBenchmarksLinks

Evaluating LLMs with fewer examples

☆160

Alternatives and similar repositories for tinyBenchmarks

Users that are interested in tinyBenchmarks are comparing it to the libraries listed below

Sorting:

msclar / formatspread
Code accompanying "How I learned to start worrying about prompt formatting".
☆107Updated last month
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆233Updated 9 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 10 months ago
SalesforceAIResearch / LaTRO
☆117Updated 5 months ago
JinjieNi / MixEval
The official evaluation suite and dynamic data release for MixEval.
☆242Updated 8 months ago
architsharma97 / dpo-rlaif
☆99Updated last year
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆175Updated 4 months ago
ScalerLab / JudgeBench
☆91Updated 8 months ago
da03 / Internalize_CoT_Step_by_Step
☆187Updated 3 months ago
jakespringer / echo-embeddings
☆152Updated last year
dwzhu-pku / LongEmbed
LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)
☆140Updated 8 months ago
CodeCreator / WebOrganizer
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
☆58Updated 3 months ago
JacobPfau / fillerTokens
☆67Updated last year
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆86Updated last year
RulinShao / retrieval-scaling
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
☆207Updated last month
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆89Updated 8 months ago
jeffreysijuntan / lloco
The official repo for "LLoCo: Learning Long Contexts Offline"
☆118Updated last year
WildEval / ZeroEval
A simple unified framework for evaluating LLMs
☆235Updated 3 months ago
neulab / gemini-benchmark
☆149Updated last year
cohere-ai / magikarp
Code for the paper "Fishing for Magikarp"
☆160Updated 2 months ago
voidism / Lookback-Lens
Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"
☆129Updated 11 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
wang-research-lab / agentinstruct
Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"
☆114Updated 10 months ago
shengliu66 / ICV
Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
☆182Updated 5 months ago
sher222 / LeReT
Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
☆49Updated 9 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated 11 months ago
allenai / CommonGen-Eval
Evaluating LLMs with CommonGen-Lite
☆90Updated last year
allenai / infinigram-api
☆70Updated 2 weeks ago
princeton-nlp / HELMET
The HELMET Benchmark
☆161Updated 3 months ago
zai-org / ComplexFuncBench
Complex Function Calling Benchmark.
☆123Updated 6 months ago