zaydzuhri / flameView external linksLinks
Fork of Flame repo for training of some new stuff in development
☆19Jan 5, 2026Updated last month
Alternatives and similar repositories for flame
Users that are interested in flame are comparing it to the libraries listed below
Sorting:
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆87Sep 12, 2025Updated 5 months ago
- ☆21Jul 21, 2025Updated 6 months ago
- Universal Neurons in GPT2 Language Models☆30May 28, 2024Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- This repository includes the code to download the curated HuggingFace papers into a single markdown formatted file☆16Jul 26, 2024Updated last year
- ☆14Oct 4, 2024Updated last year
- ☆19Aug 4, 2025Updated 6 months ago
- ☆16Oct 20, 2025Updated 3 months ago
- ☆12Feb 5, 2026Updated last week
- [ICLR 2025 Oral] Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition☆17Nov 25, 2024Updated last year
- Transmute AI Lab Model Efficiency Toolkit☆19Oct 2, 2023Updated 2 years ago
- MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following☆16Oct 31, 2024Updated last year
- Experiments to assess SPADE on different LLM pipelines.☆17Apr 7, 2024Updated last year
- Transformer related optimization, including BERT, GPT☆14Jun 27, 2023Updated 2 years ago
- ☆23Jan 27, 2025Updated last year
- Official Repository for Task-Circuit Quantization☆24Jun 1, 2025Updated 8 months ago
- Implementation for robust ViT and scaled attention☆21Apr 4, 2025Updated 10 months ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- Single-pass Adaptive Image Tokenization for Minimum Program Search | What's the Kolmogorov Complexity of an Image?☆42Jul 26, 2025Updated 6 months ago
- The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆21Nov 9, 2025Updated 3 months ago
- ☆60Jan 8, 2026Updated last month
- ☆38Oct 31, 2025Updated 3 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆157Apr 7, 2025Updated 10 months ago
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Aug 3, 2024Updated last year
- Masked Structural Growth for 2x Faster Language Model Pre-training