NousResearch / Open-Reasoning-Tasks
A comprehensive repository of reasoning tasks for LLMs (and beyond)
☆423Updated 5 months ago
Alternatives and similar repositories for Open-Reasoning-Tasks:
Users that are interested in Open-Reasoning-Tasks are comparing it to the libraries listed below
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆224Updated 10 months ago
- ☆438Updated 5 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆168Updated 2 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆402Updated last week
- awesome synthetic (text) datasets☆264Updated 4 months ago
- ☆150Updated 3 months ago
- Long context evaluation for large language models☆202Updated 2 weeks ago
- ☆104Updated 2 months ago
- A simple unified framework for evaluating LLMs☆204Updated last week
- ☆152Updated 8 months ago
- AWM: Agent Workflow Memory☆253Updated last month
- System 2 Reasoning Link Collection☆811Updated this week
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆137Updated last month
- AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and re…☆267Updated last week
- Just a bunch of benchmark logs for different LLMs☆119Updated 7 months ago
- ☆832Updated 6 months ago
- Tutorial for building LLM router☆187Updated 8 months ago
- Automatic evals for LLMs☆334Updated this week
- A simple Python sandbox for helpful LLM data agents☆235Updated 9 months ago
- Automatically evaluate your LLMs in Google Colab☆604Updated 10 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆122Updated this week
- Fast parallel LLM inference for MLX☆174Updated 8 months ago
- WIP - Allows you to create DSPy pipelines using ComfyUI☆188Updated 3 months ago
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆287Updated 2 weeks ago