☆30Dec 31, 2025Updated 2 months ago
Alternatives and similar repositories for hybrid-distillation
Users that are interested in hybrid-distillation are comparing it to the libraries listed below
Sorting:
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated last year
- 🔥 A minimal training framework for scaling FLA models☆352Nov 15, 2025Updated 3 months ago
- ☆227Nov 19, 2025Updated 3 months ago
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…☆21Mar 15, 2025Updated 11 months ago
- Code and data for paper "(How) do Language Models Track State?"☆22Mar 31, 2025Updated 11 months ago
- ☆66Jul 8, 2025Updated 8 months ago
- Official implementation of Log-linear Sparse Attention (LLSA).☆58Feb 2, 2026Updated last month
- Stick-breaking attention☆62Jul 1, 2025Updated 8 months ago
- ☆131Jun 6, 2025Updated 9 months ago
- Experiments on the impact of depth in transformers and SSMs.☆41Oct 23, 2025Updated 4 months ago
- Official Code Repository for the paper "Key-value memory in the brain"☆31Feb 25, 2025Updated last year
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆141Feb 25, 2026Updated last week
- ☆12Nov 3, 2024Updated last year
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models☆46Jul 17, 2025Updated 7 months ago
- Linear Attention Sequence Parallelism (LASP)☆89Jun 4, 2024Updated last year
- [CVPR 2026] Official repo for "VideoSSR: Video Self-Supervised Reinforcement Learning"☆33Nov 11, 2025Updated 3 months ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆10Feb 13, 2024Updated 2 years ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆92Updated this week
- an implementation of paper"Retentive Network: A Successor to Transformer for Large Language Models" https://arxiv.org/pdf/2307.08621.pdf☆11Jul 25, 2023Updated 2 years ago
- Implementation of Reinforce for educational purposes.☆12Jun 12, 2023Updated 2 years ago
- Repository for the implementation of our work on hypergraph generation as part of the ANR project "SODA".☆13Oct 27, 2025Updated 4 months ago
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆98Dec 17, 2025Updated 2 months ago
- Efficient retrieval head analysis with triton flash attention that supports topK probability☆13Jun 15, 2024Updated last year
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 7 months ago
- Linear-complexity Private Function Evaluation (PFE) based on homomorphic encryption (as presented at ESORICS'20).☆10Sep 14, 2020Updated 5 years ago
- Bridging Retrieval and Inference through Evidence Fusion☆12Oct 20, 2025Updated 4 months ago
- Fully open reproduction of DeepSeek-R1☆11Mar 24, 2025Updated 11 months ago
- Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention☆12May 24, 2023Updated 2 years ago
- High-performance tokenized language data-loader for Python C++ extension☆14Jul 22, 2024Updated last year
- An Tensorflow.keras implementation of Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorizatio…☆10Dec 18, 2019Updated 6 years ago
- ☆12Jun 15, 2023Updated 2 years ago
- ☆10Dec 18, 2023Updated 2 years ago
- ☆11Feb 26, 2024Updated 2 years ago
- Ilya Sutskever 推荐的30篇Deep learning 必读论文 (中英文对照翻译版)☆13Dec 18, 2024Updated last year
- A toolkit for developers to simplify the transformation of nn.Module instances. It's now corresponding to Pytorch.fx.☆13Apr 7, 2023Updated 2 years ago
- ☆12Jan 29, 2021Updated 5 years ago
- Spectral Sphere Optimizer☆104Jan 14, 2026Updated last month