Official JAX implementation of End-to-End Test-Time Training for Long Context
☆542Feb 15, 2026Updated 2 weeks ago
Alternatives and similar repositories for e2e
Users that are interested in e2e are comparing it to the libraries listed below
Sorting:
- An automated data pipeline scaling RL to pretraining levels☆72Oct 11, 2025Updated 4 months ago
- Recursive Bayesian Networks☆11May 11, 2025Updated 9 months ago
- Code release for paper "Test-Time Training Done Right"☆379Jan 5, 2026Updated last month
- The first open-domain closed-loop revisited benchmark for evaluating memory consistency and action control in world models.☆41Feb 10, 2026Updated 3 weeks ago
- Storing long contexts in tiny caches with self-study☆243Dec 5, 2025Updated 2 months ago
- Stable-DiffCoder is a family of lightweight open-source code DLLMs(diffusion large language models) comprising base and instruct models, …☆75Jan 23, 2026Updated last month
- OmniGAIA: Towards Native Omni-Modal AI Agents☆46Updated this week
- ☆134May 29, 2025Updated 9 months ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆472May 17, 2025Updated 9 months ago
- ☆399Nov 7, 2025Updated 3 months ago
- Official Project Page for HLA: Higher-order Linear Attention (https://arxiv.org/abs/2510.27258)☆45Jan 6, 2026Updated last month
- Crawl & visualize ICLR papers and reviews.☆18Nov 5, 2022Updated 3 years ago
- Research work aimed at addressing the problem of modeling infinite-length context☆46Dec 18, 2025Updated 2 months ago
- implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880☆317Feb 17, 2026Updated 2 weeks ago
- ☆468Feb 22, 2026Updated last week
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆129Jun 24, 2025Updated 8 months ago
- Residual Context Diffusion (RCD): Repurposing discarded signals as structured priors for high-performance reasoning in dLLMs.☆54Feb 11, 2026Updated 3 weeks ago
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆64Jan 26, 2026Updated last month
- ☆30Jun 7, 2025Updated 8 months ago
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,428Updated this week
- This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM☆60May 28, 2024Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆237Jun 15, 2025Updated 8 months ago
- MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence☆55Feb 10, 2026Updated 3 weeks ago
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆41Oct 31, 2025Updated 4 months ago
- ☆80Mar 11, 2025Updated 11 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆1,189Jan 30, 2025Updated last year
- [CVPR 2026] FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection☆25Feb 10, 2026Updated 3 weeks ago
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆14Apr 30, 2025Updated 10 months ago
- ☆10Oct 28, 2020Updated 5 years ago
- ☆12Nov 21, 2023Updated 2 years ago
- ☆11Oct 11, 2023Updated 2 years ago
- (Siggraph Asia 2023) Project Page of "HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image"☆10Dec 9, 2023Updated 2 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 2 years ago
- ThetaEvolve: Test-time Learning on Open Problems, enabling RL training on AlphaEvolve/OpenEvolve and emphasizing scaling test-time comput…☆132Updated this week
- Minimal Transformer base in JAX. A single backbone for language modelling, diffusion, classification, etc...☆14May 28, 2025Updated 9 months ago
- Langchain + Docker + Neo4j☆10Oct 29, 2024Updated last year
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year