☆56Aug 5, 2025Updated 7 months ago
Alternatives and similar repositories for UserBench
Users that are interested in UserBench are comparing it to the libraries listed below
Sorting:
- ☆20Nov 3, 2024Updated last year
- This is the source code of FUSION, a safety-aware causal representation for generalizable driving agents.☆26Oct 23, 2024Updated last year
- [ICLR 2026] Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"☆123Feb 2, 2026Updated last month
- Evaluating Durability: Benchmark Insights into Multimodal Watermarking☆12Jun 7, 2024Updated last year
- Feasibility Consistent Representation Learning for Safe Reinforcement Learning (ICML 2024). Current SOTA model-free safe RL algorithm on …☆14Jul 12, 2024Updated last year
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆38Feb 27, 2024Updated 2 years ago
- ☆18Jan 5, 2025Updated last year
- Train and visualise a latent variable model of moving objects.☆16Apr 28, 2020Updated 5 years ago
- The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models"☆31Mar 5, 2026Updated 2 weeks ago
- Efficient Scaling laws and collaborative pretraining.☆21Sep 18, 2025Updated 6 months ago
- Paper: “MEMRL: SELF-EVOLVING AGENTS VIA RUNTIME REINFORCEMENT LEARNING ON EPISODIC MEMORY” Open-Source Code☆55Feb 27, 2026Updated 3 weeks ago
- ☆78Nov 6, 2025Updated 4 months ago
- ☆15Dec 12, 2024Updated last year
- [ICLR 2025] Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning.☆85Aug 11, 2025Updated 7 months ago
- AMR-parser. Code for EMNLP2019 paper "Core Semantic First: A Top-down Approach for AMR Parsing."☆11Feb 23, 2020Updated 6 years ago
- ☆78Updated this week
- Code for the arxiv paper: Complex Claim Verification with Evidence Retrieved in the Wild☆13Nov 27, 2023Updated 2 years ago
- Benchmarking Retrieval-Augmented Generation in Multi-Turn Legal Consultation Conversation☆38Mar 3, 2025Updated last year
- Toolathlon-Gym for testing AI agents real-world tool-use capabilities across diverse MCP servers.☆71Mar 12, 2026Updated last week
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆64Jun 13, 2025Updated 9 months ago
- An Open-Source Reinforcement Learning Framework for Robot-Task Environments☆27Jul 6, 2023Updated 2 years ago
- ☆44Feb 27, 2026Updated 3 weeks ago
- Text generation using language models with multiple exit heads☆16Sep 18, 2025Updated 6 months ago
- 本项目是July的《程序员编程艺术》的电子书版本☆11Jan 9, 2014Updated 12 years ago
- ⚠️ ARCHIVED - All development moved to https://github.com/itbench-hub/ITBench/tree/main/scenarios☆15Feb 24, 2026Updated 3 weeks ago
- Paper Reading Summary(mainly NLP related papers)☆11Nov 6, 2019Updated 6 years ago
- ☆23Oct 30, 2025Updated 4 months ago
- ☆44Oct 1, 2024Updated last year
- ☆26Oct 27, 2025Updated 4 months ago
- Bayes-Adaptive RL for LLM Reasoning☆46May 28, 2025Updated 9 months ago
- [WWW '24] UnifiedSSR: A Unified Framework of Sequential Search and Recommendation☆12Feb 16, 2024Updated 2 years ago
- ☆12Feb 6, 2021Updated 5 years ago
- [arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies☆60Feb 6, 2026Updated last month
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆84Jan 12, 2025Updated last year
- ☆21Nov 5, 2024Updated last year
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆86Jan 21, 2026Updated last month
- Code repo for FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMs.☆32Nov 5, 2025Updated 4 months ago
- The code for HerO: a fact-checking pipeline based on open LLMs (the runner-up in AVeriTeC)☆14Mar 18, 2025Updated last year
- [CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling☆207Mar 9, 2026Updated last week