philschmid / MixEvalLinks

The official evaluation suite and dynamic data release for MixEval.

☆11

Alternatives and similar repositories for MixEval

Users that are interested in MixEval are comparing it to the libraries listed below

Sorting:

s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆55Updated 6 months ago
lancedb / ragged
☆20Updated 9 months ago
arcee-ai / DAM
☆53Updated 9 months ago
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆81Updated last week
MaxBelitsky / cache-steering
KV Cache Steering for Inducing Reasoning in Small Language Models
☆36Updated 2 weeks ago
UpstageAI / evalverse-IFEval
Submodule of evalverse forked from [google-research/instruction_following_eval](https://github.com/google-research/google-research/tree/m…
☆14Updated last year
NielsRogge / awesome-huggingface
Repository containing awesome resources regarding Hugging Face tooling.
☆47Updated last year
catid / lllm
Latent Large Language Models
☆18Updated 11 months ago
nexusflowai / NexusBench
Nexusflow function call, tool use, and agent benchmarks.
☆27Updated 7 months ago
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆36Updated 2 weeks ago
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆130Updated 2 months ago
argilla-io / distilabel-spin-dibt
Repository containing the SPIN experiments on the DIBT 10k ranked prompts
☆24Updated last year
kyegomez / Exa
Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…
☆26Updated 8 months ago
krypticmouse / matryoshka-representation-learning
PyTorch implementation for MRL
☆19Updated last year
brendanhogan / picoDeepResearch
☆65Updated 2 months ago
allenai / infinigram-api
☆73Updated 3 weeks ago
weaviate / structured-rag
Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models
☆111Updated 3 months ago
Pleias / Various-Finetuning
Set of scripts to finetune LLMs
☆37Updated last year
facebookresearch / collaborative-reasoner
Source code for the collaborative reasoner research project at Meta FAIR.
☆99Updated 3 months ago
mistralai / mistral-evals
☆75Updated 3 months ago
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆53Updated 8 months ago
huggingface / huggingface-inference-toolkit
Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.
☆83Updated this week
axolotl-ai-cloud / axolotl-cookbook
☆34Updated last week
thomasnormal / fewshot
☆28Updated last month
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆68Updated 3 months ago
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆49Updated last year
allenai / olmo-cookbook
OLMost every training recipe you need to perform data interventions with the OLMo family of models.
☆37Updated this week
benediktstroebl / agent-evals
☆23Updated 2 months ago
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆79Updated 10 months ago
enjalot / latent-data-modal
Using modal.com to process FineWeb-edu data
☆20Updated 4 months ago