METR / eval-analysis-public
Public repository containing METR's DVC pipeline for eval data analysis
☆33Updated this week
Alternatives and similar repositories for eval-analysis-public:
Users that are interested in eval-analysis-public are comparing it to the libraries listed below
- Simple GRPO scripts and configurations.☆59Updated last month
- ☆15Updated 6 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆56Updated last week
- ☆48Updated 4 months ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆72Updated last week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆63Updated 3 months ago
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆25Updated 9 months ago
- ☆27Updated 4 months ago
- Small, simple agent task environments for training and evaluation☆18Updated 4 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆39Updated last month
- Official Code Release for "Training a Generally Curious Agent"☆19Updated 3 weeks ago
- The official evaluation suite and dynamic data release for MixEval.☆11Updated 6 months ago
- Large Language Model (LLM) powered evaluator for Retrieval Augmented Generation (RAG) pipelines.☆25Updated 11 months ago
- ☆22Updated 11 months ago
- Chat Markup Language conversation library☆55Updated last year
- ☆24Updated last year
- ☆26Updated 3 weeks ago
- ☆28Updated 6 months ago
- ☆38Updated 8 months ago
- ☆19Updated 5 months ago
- Train, tune, and infer Bamba model☆87Updated 2 months ago
- ☆66Updated 10 months ago
- NanoGPT (124M) quality in 2.67B tokens☆28Updated last month
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆67Updated 4 months ago
- ☆124Updated last week
- Train your own SOTA deductive reasoning model☆81Updated 3 weeks ago
- ☆38Updated last month
- Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation☆26Updated last month
- 🤗 Benchmark Large Language Models Reliably On Your Data☆15Updated this week