benediktstroebl / agent-evals
☆12Updated 3 months ago
Alternatives and similar repositories for agent-evals:
Users that are interested in agent-evals are comparing it to the libraries listed below
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆14Updated 10 months ago
- ☆46Updated 2 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- Training hybrid models for dummies.☆16Updated 3 weeks ago
- this is for fun, ain't it grand!☆12Updated 8 months ago
- LLM reads a paper and produce a working prototype☆44Updated last week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆52Updated 4 months ago
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Updated last year
- Minimum Description Length probing for neural network representations☆18Updated this week
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆22Updated last month
- Elevate your language models with insightful diversity metrics.☆11Updated 11 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆28Updated 2 months ago
- Transform unstructured documents into actionable, structured data with enterprise-grade precision and reliability, ready for large-scale …☆14Updated 3 weeks ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 10 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- Implementation of Spectral State Space Models☆18Updated 10 months ago
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Updated last month
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆47Updated last month
- NeurIPS 2024 tutorial on LLM Inference☆35Updated last month
- ☆12Updated 2 months ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated 2 weeks ago
- Latent Large Language Models☆17Updated 4 months ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆40Updated 9 months ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks"☆17Updated 3 weeks ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Updated 6 months ago
- A repository re-creating the PromptBreeder Evolutionary Algorithm from the DeepMind Paper in Python using LMQL as the backend.☆27Updated last year
- Using multiple LLMs for ensemble Forecasting☆16Updated 11 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆25Updated this week
- Lottery Ticket Adaptation☆37Updated last month