☆58Aug 5, 2025Updated 8 months ago
Alternatives and similar repositories for UserBench
Users that are interested in UserBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The raw UserRL repo under construction☆97Sep 25, 2025Updated 6 months ago
- ☆20Nov 3, 2024Updated last year
- Functional Optimal Transport: Map Estimation and Domain Adaptation for Functional data☆27Jun 7, 2021Updated 4 years ago
- This is the source code of FUSION, a safety-aware causal representation for generalizable driving agents.☆26Oct 23, 2024Updated last year
- An open-source framework to benchmark and assess safety specifications of Reinforcement Learning problems.☆14Aug 25, 2023Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Companion code to https://arxiv.org/abs/2402.15491☆22Sep 18, 2025Updated 6 months ago
- [ICLR 2026] Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"☆126Feb 2, 2026Updated 2 months ago
- Evaluating Durability: Benchmark Insights into Multimodal Watermarking☆12Jun 7, 2024Updated last year
- Feasibility Consistent Representation Learning for Safe Reinforcement Learning (ICML 2024). Current SOTA model-free safe RL algorithm on …☆15Jul 12, 2024Updated last year
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆39Feb 27, 2024Updated 2 years ago
- ☆17Jan 5, 2025Updated last year
- Plancraft is a minecraft environment and agent suite to test planning capabilities in LLMs☆27Nov 7, 2025Updated 5 months ago
- ☆30Jan 25, 2026Updated 2 months ago
- Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models☆14Sep 3, 2025Updated 7 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- ☆79Nov 6, 2025Updated 5 months ago
- Source code repository for our EMNLP paper on cross-domain claim identification☆14Oct 24, 2018Updated 7 years ago
- [CHIL 2024] Interpretation of Intracardiac Electrograms Through Textual Representations☆12Sep 4, 2024Updated last year
- ☆80Updated this week
- AMR-parser. Code for EMNLP2019 paper "Core Semantic First: A Top-down Approach for AMR Parsing."☆11Feb 23, 2020Updated 6 years ago
- Code for the arxiv paper: Complex Claim Verification with Evidence Retrieved in the Wild☆13Nov 27, 2023Updated 2 years ago
- An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models☆15Feb 27, 2025Updated last year
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆66Jun 13, 2025Updated 9 months ago
- Benchmark dataset for the paper "Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with …☆24May 20, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Paper: “MEMRL: SELF-EVOLVING AGENTS VIA RUNTIME REINFORCEMENT LEARNING ON EPISODIC MEMORY” Open-Source Code☆75Feb 27, 2026Updated last month
- ☆43Feb 27, 2026Updated last month
- ☆11May 29, 2025Updated 10 months ago
- Toolathlon-Gym for testing AI agents real-world tool-use capabilities across diverse MCP servers.☆104Apr 2, 2026Updated last week
- Multi-agent synthetic data generation pipeline capable of generating and validating long horizon terminal/coding tasks for RL training☆59Jul 28, 2025Updated 8 months ago
- ⚠️ ARCHIVED - All development moved to https://github.com/itbench-hub/ITBench/tree/main/scenarios☆15Feb 24, 2026Updated last month
- ☆23Oct 30, 2025Updated 5 months ago
- ☆44Oct 1, 2024Updated last year
- ☆26Oct 27, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Bayes-Adaptive RL for LLM Reasoning☆46May 28, 2025Updated 10 months ago
- [arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies☆60Feb 6, 2026Updated 2 months ago
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆86Jan 12, 2025Updated last year
- ☆21Nov 5, 2024Updated last year
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆87Jan 21, 2026Updated 2 months ago
- Code repo for FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMs.☆32Nov 5, 2025Updated 5 months ago
- The code for HerO: a fact-checking pipeline based on open LLMs (the runner-up in AVeriTeC)☆15Mar 18, 2025Updated last year