chchenhui/mlrbench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/chchenhui/mlrbench)

chchenhui / mlrbench

[NeurIPS 2025 D&B Track] MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research

☆32

Alternatives and similar repositories for mlrbench

Users that are interested in mlrbench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

declare-lab / SAT
View on GitHub
Code for the EMNLP 2022 Findings short paper "SAT: Improving Semi-Supervised Text Classification with Simple Instance-Adaptive Self-Train…
☆12Feb 25, 2023Updated 3 years ago
declare-lab / TEAM
View on GitHub
Our EMNLP 2022 paper on MCQA
☆23Jan 15, 2023Updated 3 years ago
facebookresearch / ssorl
View on GitHub
Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories
☆43Jul 16, 2023Updated 3 years ago
declare-lab / DoubleMix
View on GitHub
Code for the COLING 2022 paper "DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification"
☆19Oct 19, 2022Updated 3 years ago
jasonyux / TriPosT
View on GitHub
☆12Jan 25, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
zhen8838 / AnimeGAN
View on GitHub
Tensorflow 2.0 Implement of AnimeGAN
☆12Apr 26, 2020Updated 6 years ago
amodaresi / MemLLM
View on GitHub
☆13Aug 13, 2024Updated last year
Open-Social-World / autolibra
View on GitHub
AutoLibra: Metric Induction for Agents from Open-Ended Human Feedback
☆19Apr 23, 2026Updated 2 months ago
zjunlp / Kformer
View on GitHub
[NLPCC 2022] Kformer: Knowledge Injection in Transformer Feed-Forward Layers
☆39Oct 20, 2022Updated 3 years ago
Hongcheng-Gao / HAVEN
View on GitHub
Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".
☆25Oct 22, 2025Updated 8 months ago
FrontierCS / Frontier-CS
View on GitHub
A benchmark for evaluating LLMs on open-ended CS problems. Exploring the Next Frontier of Computer Science.
☆275Jul 15, 2026Updated last week
RUCKBReasoning / CodeRM
View on GitHub
Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'
☆27May 16, 2025Updated last year
jiahaolu97 / anything-unsegmentable
View on GitHub
(CVPR 2024) "Unsegment Anything by Simulating Deformation"
☆29May 27, 2024Updated 2 years ago
wmt-conference / wmt23-news-systems
View on GitHub
☆14Oct 6, 2025Updated 9 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
MiaoXiong2320 / llm-uncertainty
View on GitHub
code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"
☆148Mar 14, 2024Updated 2 years ago
YujiaBao / R2A
View on GitHub
"Deriving Machine Attention from Human Rationales" EMNLP 2018
☆26Feb 15, 2019Updated 7 years ago
zwy-Giser / MetroGAN
View on GitHub
Data and codes for MetroGAN
☆16Dec 23, 2024Updated last year
chorusai / brave
View on GitHub
Brave is a simple visualisation library for NLP information extraction, built on top of embedded BRAT.
☆15Dec 25, 2019Updated 6 years ago
reds-lab / LAVA
View on GitHub
This is an official repository for "LAVA: Data Valuation without Pre-Specified Learning Algorithms" (ICLR2023).
☆54Jun 5, 2024Updated 2 years ago
BUPT-GAMMA / Graph-Structure-Estimation-Neural-Networks
View on GitHub
Source code for WWW 2021 paper "Graph Structure Estimation Neural Networks"
☆60Jul 15, 2021Updated 5 years ago
yf-he / EvoTest
View on GitHub
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems (ICLR'26)
☆24Nov 3, 2025Updated 8 months ago
KodCode-AI / code-r1
View on GitHub
Reproducing R1 for Code with Reliable Rewards
☆13Apr 9, 2025Updated last year
alchemistyzz / PeRL
View on GitHub
[NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"
☆30Mar 30, 2026Updated 3 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
vivekmyers / tra-ogbench
View on GitHub
☆18Feb 13, 2025Updated last year
MGitHubL / TMac
View on GitHub
☆14Feb 26, 2024Updated 2 years ago
yingweima2022 / CodeLLM
View on GitHub
☆12Jan 31, 2024Updated 2 years ago
GAIR-NLP / InnovatorBench
View on GitHub
[ICLR 2026]InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
☆16Feb 3, 2026Updated 5 months ago
Ji-shuo / MRAgent
View on GitHub
☆221Jun 8, 2026Updated last month
activatedgeek / calibration-tuning
View on GitHub
☆53Apr 9, 2025Updated last year
MASWorks / ML-Agent
View on GitHub
The official implementation of "ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering"
☆70Jun 21, 2025Updated last year
SunQingYun1996 / SUGAR
View on GitHub
Code for "SUGAR: Subgraph Neural Network with Reinforcement Pooling and Self-Supervised Mutual Information Mechanism""
☆10Apr 17, 2021Updated 5 years ago
princeton-pli / PruLong
View on GitHub
Code for the preprint "Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?"
☆48Jul 29, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ChenglinYu / BHN
View on GitHub
☆10May 28, 2023Updated 3 years ago
mukhal / PromptRank
View on GitHub
[ACL 2023] Few-shot Reranking for Multi-hop QA via Language Model Prompting
☆27Oct 19, 2025Updated 9 months ago
kailas-v / human-ai-interactions
View on GitHub
☆11Oct 28, 2022Updated 3 years ago
feiyang-k / AutoScale
View on GitHub
Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…
☆14Aug 8, 2025Updated 11 months ago
launchnlp / LitCab
View on GitHub
☆25Jun 10, 2025Updated last year
YujieLu10 / Seeker
View on GitHub
☆11May 24, 2024Updated 2 years ago
liushiliushi / ConfTuner
View on GitHub
Official code of ConfTuner: Training Large Language Models to Express Their Confidence Verbally
☆27Sep 26, 2025Updated 9 months ago