sylvain-wei / TIMELinks
[NeurIPS 2025 D&B (Spotlightπ)] TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario
β29Updated 4 months ago
Alternatives and similar repositories for TIME
Users that are interested in TIME are comparing it to the libraries listed below
Sorting:
- Official repository for ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Useβ29Updated 3 months ago
- Resources and paper list for 'Scaling Environments for Agents'. This repository accompanies our survey on how environments contribute to β¦β58Updated last week
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"β93Updated last year
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruningβ97Updated 11 months ago
- β103Updated 3 weeks ago
- β178Updated 2 months ago
- β16Updated 3 months ago
- This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.β180Updated 7 months ago
- [COLM'25] Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?β37Updated 8 months ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!β72Updated 10 months ago
- Official Repository of LatentSeekβ76Updated 8 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuningβ91Updated 11 months ago
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concenβ¦β85Updated 7 months ago
- MemGen: Weaving Generative Latent Memory for Self-Evolving Agentsβ298Updated this week
- The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"β21Updated 3 months ago
- JudgeLRM: Large Reasoning Models as a Judgeβ40Updated last week
- [ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.β88Updated 11 months ago
- [NeurIPS 2025@FoRLM] R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Searchβ17Updated 2 weeks ago
- The official implementation of the paper "Mem-Ξ±: Learning Memory Construction via Reinforcement Learning"β164Updated last month
- [FSE'2026] SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarksβ144Updated 2 weeks ago
- [NeurIPS 2025] Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chainsβ74Updated 6 months ago
- β45Updated last month
- β63Updated 6 months ago
- Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"β122Updated last week
- π A curated list of awesome resources focusing on Context Compression techniques for Large Language Models(LLMs).β57Updated 3 weeks ago
- A Unified Framework for High-Performance and Extensible LLM Steeringβ163Updated last week
- β43Updated 5 months ago
- Research works from Tencent AI Lab regarding self-evolving agentsβ81Updated last week
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMsβ201Updated 2 months ago
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-ofβ¦β76Updated 8 months ago