Infini-AI-Lab / MultiverseLinks
β98Updated last month
Alternatives and similar repositories for Multiverse
Users that are interested in Multiverse are comparing it to the libraries listed below
Sorting:
- [NeurIPS-2024] π Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623β88Updated last year
- β85Updated 9 months ago
- β62Updated 3 months ago
- β93Updated 7 months ago
- The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sinkβ¦β95Updated last month
- β55Updated 4 months ago
- Code for "Reasoning to Learn from Latent Thoughts"β121Updated 6 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)β128Updated 3 months ago
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verificationβ65Updated 3 months ago
- [EMNLP'2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"β65Updated 6 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Modelsβ132Updated 2 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scalingβ177Updated 3 months ago
- AnchorAttention: Improved attention for LLMs long-context trainingβ213Updated 9 months ago
- β61Updated 3 months ago
- β80Updated 4 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"β102Updated last week
- An efficient implementation of the NSA (Native Sparse Attention) kernelβ119Updated 3 months ago
- Kinetics: Rethinking Test-Time Scaling Lawsβ81Updated 3 months ago
- β104Updated last month
- β60Updated last week
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]β91Updated 6 months ago
- [NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Modelsβ106Updated 5 months ago
- SIFT: Grounding LLM Reasoning in Contexts via Stickersβ58Updated 7 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.β260Updated 3 weeks ago
- Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?β115Updated last year
- The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyondβ170Updated 3 months ago
- β119Updated 4 months ago
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimizationβ76Updated 3 weeks ago
- A repo for open research on building large reasoning modelsβ107Updated last week
- MiroTrain is an efficient and algorithm-first framework for post-training large agentic models.β88Updated last month