CL-bench: A Benchmark for Context Learning
☆478Feb 8, 2026Updated last month
Alternatives and similar repositories for CL-bench
Users that are interested in CL-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML 2025] Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling☆12May 5, 2025Updated 10 months ago
- Model Merging with Functional Dual Anchors☆47Nov 23, 2025Updated 4 months ago
- Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"☆61Updated this week
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆219Oct 12, 2025Updated 5 months ago
- Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations☆22Dec 24, 2025Updated 2 months ago
- ☆355Jul 29, 2025Updated 7 months ago
- ☆37Feb 4, 2026Updated last month
- ☆19Mar 10, 2025Updated last year
- From Word to World: Can Large Language Models be Implicit Text-based World Models?☆55Dec 25, 2025Updated 2 months ago
- ☆123Jan 21, 2026Updated 2 months ago
- ☆33May 27, 2025Updated 9 months ago
- Orienting Latent Actions for Video World Modeling☆83Feb 11, 2026Updated last month
- [ICLR 2026] The implementation of paper "AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint"☆43Nov 20, 2025Updated 4 months ago
- VS Code extension to access https://chat.deepseek.com/ within VS code sidebar. Both DeepSeekChat and code in the same editor, side by sid…☆14Jan 21, 2025Updated last year
- Learning on the Job: An Experience-Driven, Self-Evolving Agent for Long-Horizon Tasks☆85Oct 16, 2025Updated 5 months ago
- ☆40Dec 26, 2025Updated 2 months ago
- 全国大学生软件测试大赛题库(2016~2024),包含国际赛和国内赛全过程题目,学生自行整理,存在缺漏☆14Nov 17, 2024Updated last year
- enchmarking Large Language Models' Resistance to Malicious Code☆14Dec 1, 2024Updated last year
- 中文大 语言模型评测2024高考数学专题☆19Jun 14, 2024Updated last year
- Code accompanying the NeurIPS 2019 paper AutoAssist: A Framework to Accelerate Training of Deep Neural Networks.☆14Oct 3, 2022Updated 3 years ago
- Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"☆21Jan 31, 2026Updated last month
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆43Oct 31, 2025Updated 4 months ago
- The official repository of MM-R5☆29Jun 22, 2025Updated 9 months ago
- ML4CO-Bench-101: Benchmark Machine Learning for Classic Combinatorial Problems on Graphs.☆42Nov 17, 2025Updated 4 months ago
- MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research☆24Sep 23, 2025Updated 6 months ago
- ConvGQR: Generative Query Reformulation for Conversational Search. A codebase for ACL 2023 accepted paper.☆34Mar 5, 2024Updated 2 years ago
- Official Implementation for the paper "Integrative Decoding: Improving Factuality via Implicit Self-consistency"☆32Apr 12, 2025Updated 11 months ago
- [EMNLP 2023] Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation☆31Oct 18, 2025Updated 5 months ago
- TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment☆10Mar 1, 2025Updated last year
- Source code for the NAACL 2021 paper: "Distantly Supervised Relation Extraction with Sentence Reconstruction and Knowledge Base Priors"☆12Jul 15, 2021Updated 4 years ago
- [COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆55Apr 6, 2025Updated 11 months ago
- The code repository of paper "TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities"☆20Dec 24, 2024Updated last year
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆45Feb 13, 2025Updated last year
- G-Refer: Graph Retrieval-Augmented Large Language Model for Explainable Recommendation☆20Mar 5, 2025Updated last year
- [NeurIPS DB 2025] IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering☆45Oct 15, 2025Updated 5 months ago
- GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators☆50Dec 23, 2025Updated 3 months ago
- AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)☆424Oct 15, 2025Updated 5 months ago
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆64Jan 26, 2026Updated last month
- Here we will test various linear attention designs.☆62Apr 25, 2024Updated last year