A testbed for agents and environments that can automatically improve models through data generation.
☆28Mar 4, 2025Updated last year
Alternatives and similar repositories for DataEnvGym
Users that are interested in DataEnvGym are comparing it to the libraries listed below
Sorting:
- KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality☆40Dec 1, 2025Updated 3 months ago
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆40Jul 13, 2024Updated last year
- Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks [ICLR 2026]☆28Mar 13, 2026Updated last week
- Official Code for the NeurIPS 2024 paper "FactorSim: Generative Simulation via Factorized Representation"☆14Sep 26, 2024Updated last year
- The repository for papaer "Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs"☆14Dec 16, 2024Updated last year
- PISCO: Precise Video Instance Insertion with Sparse Control☆49Feb 13, 2026Updated last month
- Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)☆52May 12, 2025Updated 10 months ago
- Scripts to evaluate various bias metrics for different NLG models + decoding algorithms☆16Dec 6, 2023Updated 2 years ago
- ☆25Sep 23, 2024Updated last year
- Benchmarking of 1D pattern classification networks☆10Jul 19, 2023Updated 2 years ago
- This repository provides open-source code for sparse continuous distributions and corresponding Fenchel-Young losses.☆16May 10, 2023Updated 2 years ago
- Learning MLPs to replace GNN☆10Jun 3, 2023Updated 2 years ago
- OpenPipe Reinforcement Learning Experiments☆32Mar 14, 2025Updated last year
- Robot simulator using web technologies, just JavaScript☆10Feb 13, 2020Updated 6 years ago
- 🟣 Linear Algebra interview questions and answers to help you prepare for your next machine learning and data science interview in 2026.☆18Jan 4, 2026Updated 2 months ago
- Repo for our AKBC-2021 paper: Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering☆10Oct 10, 2021Updated 4 years ago
- ☆13Jul 13, 2018Updated 7 years ago
- ☆15Sep 9, 2021Updated 4 years ago
- ☆10Jul 18, 2023Updated 2 years ago
- Python package for extractive NLP using the OpenAI API☆17Aug 28, 2024Updated last year
- ☆11Nov 8, 2023Updated 2 years ago
- Terminal-Bench-Science: Evaluating AI Agents on Complex Real-World Scientific Workflows in the Terminal☆41Updated this week
- ☆12Mar 3, 2022Updated 4 years ago
- ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy☆15Jul 19, 2021Updated 4 years ago
- High-Performance Linpack Benchmark adopted version for GPU backend☆12Sep 12, 2022Updated 3 years ago
- Code for Predictive Engagement: An Efficient Metric for Automatic Evaluation of Open-Domain Dialogue Systems☆16Jun 8, 2021Updated 4 years ago
- Training chatbot models with reinforcement learning in ParlAI.☆17Dec 8, 2022Updated 3 years ago
- Download Web-10K data by querying Bing Image Search☆10Feb 1, 2022Updated 4 years ago
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆35Jul 16, 2025Updated 8 months ago
- [EMNLP 2024] RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning☆15May 13, 2025Updated 10 months ago
- Evaluation Pipeline for medical tasks.☆12Feb 13, 2026Updated last month
- ISI tutorials☆12Oct 28, 2016Updated 9 years ago
- Chain of Images for Intuitively Reasoning☆10Nov 29, 2023Updated 2 years ago
- Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?☆17Jun 3, 2025Updated 9 months ago
- ☆20Nov 11, 2019Updated 6 years ago
- Synthetic Data Generation with Execution-Based Verification and Grounding for LLM Training.☆19Feb 7, 2025Updated last year
- A unified approach to explain conditional text generation models. Pytorch. The code of paper "Local Explanation of Dialogue Response Gene…☆16Mar 21, 2022Updated 3 years ago
- A RL approach to enable cost-effective, intelligent interactions between a local agent and a remote LLM☆79Aug 22, 2024Updated last year
- A Chrome extension that blocks content using any keywords a user specifies.☆10Jul 16, 2020Updated 5 years ago