Timothyxxx/TestTimeTrainingPapers

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Timothyxxx/TestTimeTrainingPapers)

Timothyxxx / TestTimeTrainingPapers

☆59

Alternatives and similar repositories for TestTimeTrainingPapers

Users that are interested in TestTimeTrainingPapers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

evolvent-ai / Terrarium
View on GitHub
Terrarium: Multi-turn data engine for evaluating and optimizing LLM agents in living environments.
☆53Jul 14, 2026Updated last week
evolvent-ai / ClawMark
View on GitHub
🦞 ClawMark: A Living-World Benchmark for Multi-Day, Multimodal Coworker Agents
☆116May 28, 2026Updated last month
xlang-ai / OSWorld-V2
View on GitHub
OSWorld 2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks
☆196Jul 9, 2026Updated last week
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated 11 months ago
cdhx / MarkQA
View on GitHub
Code and data for EMNLP 2023 research track paper "MarkQA: A large scale KBQA dataset with numerical reasoning"
☆13Jan 2, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
GAIR-NLP / ProX
View on GitHub
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
☆271Jul 8, 2025Updated last year
ModalMinds / gym-v
View on GitHub
A unified framework for vision-language environments with Gymnasium-compatible interface
☆35Mar 17, 2026Updated 4 months ago
GAIR-NLP / self-improvement-reversal
View on GitHub
☆13Jul 14, 2024Updated 2 years ago
koalazf99 / Awesome-DataCentric-LLM
View on GitHub
Trending projects & awesome papers about data-centric llm studies.
☆40May 20, 2025Updated last year
chenlong-clock / RULE-Unlearn
View on GitHub
[NeurIPS25] RULE: Reinforcement UnLEarning Achieves Forge-retain Pareto Optimality
☆20Oct 22, 2025Updated 8 months ago
GaryStack / Trustworthy-Evaluation
View on GitHub
Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)
☆19Jul 19, 2025Updated last year
xlang-ai / OSWorld-G
View on GitHub
[NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis
☆172Jun 18, 2026Updated last month
cdhx / QDTQA
View on GitHub
Code for AAAI 2023 research track paper "Question Decomposition Tree for Answering Complex Questions over Knowledge Bases"
☆17Jan 3, 2024Updated 2 years ago
HXX97 / GMT-KBQA
View on GitHub
Code and data for GMT-KBQA
☆17Jan 5, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
cxcscmu / Montessori-Instruct
View on GitHub
Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning [ICLR 2025]
☆51Jan 24, 2025Updated last year
sail-sg / feedback-conditional-policy
View on GitHub
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"
☆65Jan 5, 2026Updated 6 months ago
HazyResearch / wonderbread
View on GitHub
WONDERBREAD benchmark + dataset for BPM tasks
☆35Jul 30, 2025Updated 11 months ago
GAIR-NLP / weak-to-strong-reasoning
View on GitHub
☆59Sep 2, 2024Updated last year
jinzhuoran / RAG-RewardBench
View on GitHub
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆18Dec 19, 2024Updated last year
GAIR-NLP / BeHonest
View on GitHub
BeHonest: Benchmarking Honesty in Large Language Models
☆35Aug 15, 2024Updated last year
axon-rl / gem
View on GitHub
A Gym for Agentic LLMs
☆502Jan 21, 2026Updated 6 months ago
sudanl / kNN-TL
View on GitHub
kNN-TL: k-Nearest-Neighbor Transfer Learning for Low-Resource Neural Machine Translation (ACL2023)
☆11Jul 26, 2023Updated 2 years ago
tml-epfl / icl-alignment
View on GitHub
Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]
☆33Jan 23, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Trae1ounG / BuPO
View on GitHub
[arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
☆60Feb 6, 2026Updated 5 months ago
xlang-ai / computer-agent-arena
View on GitHub
[ICLR 2026] Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents
☆66Feb 26, 2026Updated 4 months ago
BytedTsinghua-SIA / Enigmata
View on GitHub
Resources for the Enigmata Project.
☆82Aug 13, 2025Updated 11 months ago
holarissun / RewardModelingBeyondBradleyTerry
View on GitHub
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆73Apr 2, 2025Updated last year
gouki510 / Topology_of_Reasoning
View on GitHub
☆42Jun 11, 2025Updated last year
yunfeixie233 / ViGaL
View on GitHub
☆70Feb 4, 2026Updated 5 months ago
Interplay-LM-Reasoning / Interplay-LM-Reasoning
View on GitHub
[ICML 2026 Spotlight] On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
☆162Jun 8, 2026Updated last month
upiterbarg / lintseq
View on GitHub
[ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)
☆19Feb 11, 2025Updated last year
xlang-ai / CUA-Gym
View on GitHub
Scalable pipeline for synthesizing verifiable RLVR training data for computer-use agents
☆177May 26, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
GAIR-NLP / daVinci-Dev
View on GitHub
[ICML 2026 Oral] Agent-native Mid-training for Software Engineering
☆71Jun 7, 2026Updated last month
HKUNLP / multilingual-transfer
View on GitHub
Code for paper ”Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability“
☆15Jun 13, 2023Updated 3 years ago
Xiaoyu-SZ / LLMasEvaluator
View on GitHub
Large Language Models as Evaluators for Recommendation Explanations (RecSys 2024 Reproducibility)
☆21Aug 13, 2025Updated 11 months ago
zhaoxlpku / DynaAct
View on GitHub
☆15Nov 12, 2025Updated 8 months ago
JinjieNi / Quokka
View on GitHub
The official github repo for "Training Optimal Large Diffusion Language Models", the first-ever large-scale diffusion language models sca…
☆46Nov 6, 2025Updated 8 months ago
GAIR-NLP / Med
View on GitHub
[ICML 2026] What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-…
☆21May 15, 2026Updated 2 months ago
cjj826 / GoalAct
View on GitHub
The repo for our paper: Enhancing LLM-Based Agents via Global Planning and Hierarchical Execution (NCIIP 2025 Best Paper)
☆17Aug 18, 2025Updated 11 months ago