google / lmevalLinks

☆222

Alternatives and similar repositories for lmeval

Users that are interested in lmeval are comparing it to the libraries listed below

Sorting:

rungalileo / agent-leaderboard
Ranking LLMs on agentic tasks
☆176Updated 2 weeks ago
cfahlgren1 / observers
A Lightweight Library for AI Observability
☆249Updated 5 months ago
ServiceNow / Fast-LLM
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
☆217Updated this week
jina-ai / correlations
Simple UI for debugging correlations of text embeddings
☆288Updated 2 months ago
anyscale / llm-router
Tutorial for building LLM router
☆221Updated last year
huggingface / yourbench
🤗 Benchmark Large Language Models Reliably On Your Data
☆367Updated this week
aymeric-roucher / GAIA
Beating the GAIA benchmark with Transformers Agents. 🚀
☆131Updated 5 months ago
SalesforceAIResearch / SFR-RAG
☆77Updated 6 months ago
weaviate / structured-rag
Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models
☆111Updated 3 months ago
menloresearch / ReZero
☆155Updated 3 months ago
jlscheerer / xtr-warp
XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.
☆152Updated 3 months ago
argilla-io / synthetic-data-generator
Build datasets using natural language
☆505Updated 2 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆103Updated 4 months ago
LLMSELECTOR / LLMSELECTOR
☆73Updated 5 months ago
illuin-tech / vidore-benchmark
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
☆222Updated 3 weeks ago
weaviate / gorilla
Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.
☆133Updated last month
BhabhaAI / dataformer
Solving data for LLMs - Create quality synthetic datasets!
☆150Updated 6 months ago
facebookresearch / collaborative-reasoner
Source code for the collaborative reasoner research project at Meta FAIR.
☆99Updated 3 months ago
MinorJerry / OpenWebVoyager
☆78Updated 9 months ago
microsoft / lost_in_conversation
Code that accompanies the public release of the paper Lost in Conversation (https://arxiv.org/abs/2505.06120)
☆148Updated last month
lamini-ai / Lamini-Memory-Tuning
Banishing LLM Hallucinations Requires Rethinking Generalization
☆276Updated last year
writer / writing-in-the-margins
☆118Updated 11 months ago
ali-bahrainian / RAG_best_practices
☆93Updated 4 months ago
hljoren / sufficientcontext
Official page for ICLR 2025 paper "Sufficient Context: A New Lens on Retrieval Augmented Generation Systems"
☆46Updated 2 weeks ago
apple / ml-superposition-prompting
☆145Updated last year
kolenaIO / autoarena
Rank LLMs, RAG systems, and prompts using automated head-to-head evaluation
☆105Updated 7 months ago
QuixiAI / spectrum
☆128Updated 3 months ago
DeepSoftwareAnalytics / Awesome-Agent4SE
☆96Updated 10 months ago
h2oai / enterprise-h2ogpte
Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform
☆87Updated last month
agent-husky / Husky-v1
Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and …
☆345Updated last year