lliu606 / COSMOSLinks
☆14Updated 9 months ago
Alternatives and similar repositories for COSMOS
Users that are interested in COSMOS are comparing it to the libraries listed below
Sorting:
- ☆89Updated last year
- ☆53Updated last year
- ☆53Updated last year
- Token Omission Via Attention☆128Updated last year
- ☆75Updated last year
- [ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications☆52Updated last month
- ☆110Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Updated last year
- A repository for research on medium sized language models.☆77Updated last year
- Simple and efficient pytorch-native transformer training and inference (batched)☆79Updated last year
- [CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model☆24Updated last year
- EvaByte: Efficient Byte-level Language Models at Scale☆111Updated 7 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆157Updated 8 months ago
- Official Code Repository for the paper "Key-value memory in the brain"☆31Updated 9 months ago
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆85Updated 3 months ago
- The evaluation framework for training-free sparse attention in LLMs☆106Updated 2 months ago
- Official repo of dataset-decomposition paper [NeurIPS 2024]☆20Updated 11 months ago
- Experiments on the impact of depth in transformers and SSMs.☆38Updated last month
- GoldFinch and other hybrid transformer components☆45Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆86Updated last year
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Updated 9 months ago
- [ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers☆74Updated 5 months ago
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.☆105Updated last year
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆40Updated 2 months ago
- Stick-breaking attention☆62Updated 5 months ago
- RADLADS training code☆35Updated 7 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆180Updated 5 months ago
- ☆42Updated last year
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"☆21Updated 10 months ago