NVIDIA-NeMo / EvaluatorLinks
Open-source library for scalable, reproducible evaluation of AI models and benchmarks.
☆194Updated this week
Alternatives and similar repositories for Evaluator
Users that are interested in Evaluator are comparing it to the libraries listed below
Sorting:
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆273Updated last week
- Manage scalable open LLM inference endpoints in Slurm clusters☆280Updated last year
- Reproducible, flexible LLM evaluations☆337Updated 2 weeks ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆153Updated last year
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆224Updated last month
- A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.☆382Updated 7 months ago
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆287Updated this week
- ☆220Updated 3 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆356Updated 2 weeks ago
- Load compute kernels from the Hub☆397Updated this week
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆361Updated this week
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆174Updated this week
- Complex Function Calling Benchmark.☆165Updated last year
- ☆38Updated 5 months ago
- ☆232Updated 2 months ago
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆496Updated 5 months ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆225Updated 7 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆316Updated 2 years ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆250Updated last year
- code for training & evaluating Contextual Document Embedding models☆202Updated 8 months ago
- The official evaluation suite and dynamic data release for MixEval.☆255Updated last year
- Simple & Scalable Pretraining for Neural Architecture Research☆307Updated 2 months ago
- Automatic evals for LLMs☆579Updated last month
- Code that accompanies the public release of the paper Lost in Conversation (https://arxiv.org/abs/2505.06120)☆206Updated 7 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆246Updated last year
- Benchmarking library for RAG☆255Updated last week
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆228Updated this week
- An extension of the nanoGPT repository for training small MOE models.☆236Updated 11 months ago
- A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.☆62Updated 7 months ago
- Code for the paper "Fishing for Magikarp"☆180Updated 8 months ago