shaochenze / calmLinks
Official implementation of "Continuous Autoregressive Language Models"
☆79Updated this week
Alternatives and similar repositories for calm
Users that are interested in calm are comparing it to the libraries listed below
Sorting:
- ☆281Updated 2 weeks ago
- Esoteric Language Models☆104Updated last month
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆129Updated last week
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆162Updated 6 months ago
- Official repo of paper LM2☆46Updated 8 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆101Updated 2 months ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆222Updated this week
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆53Updated 7 months ago
- This is the official repository for Inheritune.☆115Updated 8 months ago
- The open-source code of MetaStone-S1.☆107Updated 3 months ago
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆51Updated 8 months ago
- Verifiers for LLM Reinforcement Learning☆78Updated 6 months ago
- A repository for research on medium sized language models.☆78Updated last year
- Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…☆84Updated 3 weeks ago
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆117Updated 3 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Updated 8 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆57Updated 7 months ago
- ☆86Updated last year
- [ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆45Updated 3 months ago
- GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 tr…☆89Updated this week
- EvaByte: Efficient Byte-level Language Models at Scale☆110Updated 6 months ago
- Code for ExploreTom☆86Updated 4 months ago
- ☆86Updated 2 weeks ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆51Updated last week
- ☆50Updated last year
- ☆19Updated 8 months ago
- PyTorch implementation of models from the Zamba2 series.☆185Updated 9 months ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆108Updated 11 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated 6 months ago
- The official repository of "R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Integration"☆120Updated 2 months ago