Infini-AI-Lab / MultiverseLinks
β101Updated 2 months ago
Alternatives and similar repositories for Multiverse
Users that are interested in Multiverse are comparing it to the libraries listed below
Sorting:
- β61Updated 4 months ago
- [NeurIPS-2024] π Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623β89Updated last year
- β55Updated 5 months ago
- [EMNLP'2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"β66Updated 7 months ago
- The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sinkβ¦β101Updated last month
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Modelsβ132Updated 2 months ago
- β96Updated 8 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scalingβ179Updated 3 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernelβ124Updated 4 months ago
- β85Updated this week
- Code for "Reasoning to Learn from Latent Thoughts"β122Updated 7 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Schedulingβ40Updated 3 weeks ago
- SIFT: Grounding LLM Reasoning in Contexts via Stickersβ58Updated 8 months ago
- β66Updated 4 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)β134Updated 4 months ago
- β71Updated 3 months ago
- β120Updated 5 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.β98Updated 10 months ago
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimizationβ78Updated last month
- β108Updated last year
- AnchorAttention: Improved attention for LLMs long-context trainingβ213Updated 9 months ago
- β81Updated 4 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"β104Updated last month
- β106Updated last month
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Studyβ55Updated 11 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]β198Updated 2 weeks ago
- β61Updated 3 weeks ago
- Kinetics: Rethinking Test-Time Scaling Lawsβ82Updated 4 months ago
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verificationβ67Updated 3 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimizationβ42Updated 8 months ago