aastroza / structured-generation-benchmark
Structured Generation Evals
☆12Updated 6 months ago
Alternatives and similar repositories for structured-generation-benchmark:
Users that are interested in structured-generation-benchmark are comparing it to the libraries listed below
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆33Updated last week
- ☆28Updated 7 months ago
- An attribution library for LLMs☆38Updated 7 months ago
- ☆32Updated this week
- Training code for Sparse Autoencoders on Embedding models☆38Updated last month
- ☆48Updated 5 months ago
- Simple GRPO scripts and configurations.☆58Updated 2 months ago
- ☆48Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- Lightweight tools for quick and easy LLM demo's☆26Updated 7 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆83Updated 4 months ago
- Replicating O1 inference-time scaling laws☆83Updated 4 months ago
- ☆33Updated 2 weeks ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆22Updated 3 weeks ago
- Probabilistic LLM evaluations. [CogSci2023; ACL2023]☆73Updated 8 months ago
- ☆41Updated 2 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆39Updated 2 months ago
- Tools to make language models a bit easier to use☆42Updated last week
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆77Updated 7 months ago
- Benchmark structured generation libraries☆26Updated 5 months ago
- ☆33Updated 10 months ago
- Small, simple agent task environments for training and evaluation☆18Updated 5 months ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆39Updated last month
- Experiments for efforts to train a new and improved t5☆77Updated last year
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆11Updated 2 weeks ago
- An introduction to LLM Sampling☆77Updated 4 months ago
- NLP with Rust for Python 🦀🐍☆62Updated 10 months ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆31Updated last year
- A framework for evaluating function calls made by LLMs☆37Updated 9 months ago