CUHK-Shenzhen-SE / UTBoostLinks
[ACL'25] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench
☆19Updated this week
Alternatives and similar repositories for UTBoost
Users that are interested in UTBoost are comparing it to the libraries listed below
Sorting:
- ☆22Updated last week
- This repository contains the code for our ICML 2025 paper——LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection🎉☆22Updated last month
- ☆23Updated 2 weeks ago
- [PVLDB 2025] TAB: Unified Benchmarking of Time Series Anomaly Detection Methods☆21Updated this week
- Embodied Intelligence in Endovascular Robot Navigation -- 血管介入手术机器人具身导航☆14Updated last month
- [CVPR 2025] Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation☆39Updated this week
- (ArXiv25) Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning☆37Updated 2 weeks ago
- [ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection☆14Updated this week
- Awesome-Efficient-Inference-for-LRMs is a collection of state-of-the-art, novel, exciting, token-efficient methods for Large Reasoning Mo…☆72Updated 2 weeks ago
- SFT+RL boosts multimodal reasoning☆14Updated this week
- This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.☆128Updated last week
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆45Updated 3 months ago
- [🏆AAAI2025] Official Repo for ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area.☆31Updated 2 months ago
- [ICLR 2025] The offical implementation of "PSEC: Skill Expansion and Composition in Parameter Space", a new framework designed to facilit…☆28Updated 4 months ago
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆73Updated last week
- [arXiv 2025] Efficient Reasoning Models: A Survey☆184Updated this week
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆54Updated 2 months ago
- ☆139Updated last month
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆75Updated 3 weeks ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆73Updated 4 months ago
- Official Repository of "Learning what reinforcement learning can't"☆32Updated last week
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆46Updated last month
- More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models☆24Updated 3 weeks ago
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"☆64Updated 6 months ago
- ☆101Updated this week
- ☆119Updated last month
- Code release for VTW (AAAI 2025) Oral☆43Updated 5 months ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆125Updated last month
- [ICCV 2025] Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation.☆25Updated this week
- Provide .bst files for NeurIPS latex template☆49Updated 2 months ago