Multi-agent synthetic data generation pipeline capable of generating and validating long horizon terminal/coding tasks for RL training
☆55Jul 28, 2025Updated 7 months ago
Alternatives and similar repositories for tbench-agentic-data-pipeline
Users that are interested in tbench-agentic-data-pipeline are comparing it to the libraries listed below
Sorting:
- Convert GitHub PRs into Harbor tasks☆47Feb 27, 2026Updated last week
- Terminal-Bench-Science: Evaluating AI Agents on Complex Real-World Scientific Workflows in the Terminal☆25Updated this week
- Run GEPA on your favorite non-python libraries.☆33Jan 22, 2026Updated last month
- ☆12Nov 5, 2024Updated last year
- ⚔️ [ICLR 2026] Official code of "Search Arena: Analyzing Search-Augmented LLMs".☆52Feb 23, 2026Updated 2 weeks ago
- A TypeScript library that enables AI agents to leverage MCP (Model Context Protocol) servers for enhanced capabilities. This library inte…☆22Aug 11, 2025Updated 6 months ago
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆185Feb 17, 2025Updated last year
- ☆63Dec 29, 2025Updated 2 months ago
- [CHIL 2024] Interpretation of Intracardiac Electrograms Through Textual Representations☆12Sep 4, 2024Updated last year
- ☆48Oct 28, 2025Updated 4 months ago
- Multi-step AI agents powered by Gemini 2.0 and the LangGraph framework. These agents orchestrate complex workflows and enhance their reas…☆10Dec 19, 2024Updated last year
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆84Oct 23, 2024Updated last year
- ☆32Feb 2, 2025Updated last year
- A Claude Code plugin that solves the same problems as community frameworks (GSD, BMAD, Ralph, Agent OS) — but using the tool's native arc…☆28Mar 1, 2026Updated last week
- Martingale posterior neural networks for fast sequential decision making @ Neurips 2025☆23Nov 13, 2025Updated 3 months ago
- AI-native knowledge kernel for human/agent collaboration. Use it as a Knowledge Base, Wiki, Annotator, Research Tool, or Agentic Memory.☆29Updated this week
- Software to enable data-rich collaboration from high-resolution display walls to your laptop☆16Updated this week
- Variational inference for Dirichlet process mixture models with multinomial mixture components.☆35Jan 15, 2014Updated 12 years ago
- [ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…☆87Jan 26, 2026Updated last month
- ☆31Feb 3, 2026Updated last month
- Benchmark evaluating ocean forecasting systems against reference datasets and observations.☆26Updated this week
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆52Jul 15, 2025Updated 7 months ago
- A lightweight OAuth 2.0 Authorization Server supporting Device Authorization Grant (RFC 8628) and Authorization Code Flow with PKCE (RFC …☆32Updated this week
- JMLR Cover Letter Template☆10Dec 15, 2021Updated 4 years ago
- ☆13Oct 21, 2024Updated last year
- [MLHC 2021] Model Selection for Offline RL: Practical Considerations for Healthcare Settings. https://arxiv.org/abs/2107.11003☆10Oct 6, 2022Updated 3 years ago
- The AI Alliance project to define a reference stack for AI model and system evaluation, with evaluations, benchmarks, and leaderboards.☆13Feb 15, 2026Updated 3 weeks ago
- Fast, free, easy, and object-agnostic video anonymization☆11Dec 12, 2020Updated 5 years ago
- The official implementation of COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence.☆28Dec 30, 2025Updated 2 months ago
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year
- (CVPR 2026) Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation☆27Feb 28, 2026Updated last week
- Auction Theory Toolbox – Computer Verified Auctions☆14Jul 12, 2016Updated 9 years ago
- 🗂️ Project tempfiles backend server!!☆10Apr 29, 2024Updated last year
- This repository contains codes for *Sem 2023 paper “Generative Data Augmentation for Aspect Sentiment Quad Prediction”.☆11May 30, 2023Updated 2 years ago
- MCP server for Grok AI API integration☆22Jun 2, 2025Updated 9 months ago
- ☆41Oct 29, 2025Updated 4 months ago
- Dataset and baseline for Coling 2022 long paper (oral): "ConFiguRe: Exploring Discourse-level Chinese Figures of Speech"☆13Jul 27, 2023Updated 2 years ago
- ☆12Mar 11, 2025Updated 11 months ago
- ☆12Mar 18, 2024Updated last year