UpstageAI / evalverse-IFEvalLinks

Submodule of evalverse forked from [google-research/instruction_following_eval](https://github.com/google-research/google-research/tree/master/instruction_following_eval)

☆14

Alternatives and similar repositories for evalverse-IFEval

Users that are interested in evalverse-IFEval are comparing it to the libraries listed below

Sorting:

arcee-ai / DAM
☆55Updated last year
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated last year
google-deepmind / xtr
XTR: Rethinking the Role of Token Retrieval in Multi-Vector Retrieval
☆58Updated last year
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆58Updated last month
mungg / FABLES
☆58Updated last year
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆90Updated last year
argilla-io / distilabel-spin-dibt
Repository containing the SPIN experiments on the DIBT 10k ranked prompts
☆24Updated last year
orionw / promptriever
The first dense retrieval model that can be prompted like an LM
☆89Updated 6 months ago
KyujinHan / Sakura-SOLAR-DPO
Sakura-SOLAR-DPO: Merge, SFT, and DPO
☆116Updated last year
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 10 months ago
sail-sg / sailcraft
🚢 Data Toolkit for Sailor Language Models
☆94Updated 9 months ago
cbh123 / llmboxing
LLM boxing matches
☆58Updated last year
ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆34Updated 7 months ago
teknium1 / transformers-gptq-quant
☆45Updated 2 years ago
Digitous / LLM-SLERP-Merge
Spherical Merge Pytorch/HF format Language Models with minimal feature loss.
☆141Updated 2 years ago
swj0419 / detect-pretrain-code-contamination
☆78Updated last year
r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆45Updated last month
enjalot / latent-data-modal
Using modal.com to process FineWeb-edu data
☆20Updated 7 months ago
DunZhang / Stella
☆62Updated last year
TristanThrush / i-am-a-strange-dataset
Repository for "I am a Strange Dataset: Metalinguistic Tests for Language Models"
☆45Updated last year
wandb / llm-kr-eval
☆20Updated last year
Hannibal046 / nanoColBERT
Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).
☆79Updated last year
akjindal53244 / Arithmo
Small and Efficient Mathematical Reasoning LLMs
☆72Updated last year
jxmorris12 / cde
code for training & evaluating Contextual Document Embedding models
☆200Updated 6 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
cloneofsimo / fim-llama-deepspeed
☆32Updated last year
nexusflowai / NexusBench
Nexusflow function call, tool use, and agent benchmarks.
☆30Updated 11 months ago
ltgoslo / bert-in-context
Official implementation of "BERTs are Generative In-Context Learners"
☆32Updated 8 months ago
seonghyeonye / Flipped-Learning
[ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
☆116Updated 5 months ago
RulinShao / retrieval-scaling
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
☆219Updated 3 weeks ago