β33Dec 31, 2025Updated 5 months ago
Alternatives and similar repositories for hybrid-distillation
Users that are interested in hybrid-distillation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β249Nov 19, 2025Updated 6 months ago
- π₯ A minimal training framework for scaling FLA modelsβ392Apr 22, 2026Updated last month
- Use the tokenizer in parallel to achieve superior accelerationβ20Mar 21, 2024Updated 2 years ago
- β61Jul 9, 2024Updated last year
- [ICLR 2026] GRAPE: Group Representational Position Encoding (https://arxiv.org/abs/2512.07805)β95May 13, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Stick-breaking attentionβ63Jul 1, 2025Updated 11 months ago
- β137Jun 6, 2025Updated last year
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruningβ149Feb 25, 2026Updated 3 months ago
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling withoutβ¦β22Mar 15, 2025Updated last year
- [CVPR 2026 Highlight] Official implementation of Log-linear Sparse Attention (LLSA).β86May 1, 2026Updated last month
- β15Nov 3, 2024Updated last year
- Open-source toolkit for training, Priming, and serving next generation Hybrid architecturesβ72Updated this week
- β12Jan 29, 2021Updated 5 years ago
- code for paper "Accessing higher dimensions for unsupervised word translation"β22Jun 26, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official Code Repository for the paper "Key-value memory in the brain"β32Feb 25, 2025Updated last year
- Source code and dataset for the paper 'Saamayik: A Benchmark and Dataset for English-Sanskrit Translation'β15Oct 11, 2025Updated 8 months ago
- β45Nov 1, 2025Updated 7 months ago
- Urban Cup 2023β16Aug 2, 2023Updated 2 years ago
- Linear Attention Sequence Parallelism (LASP)β88Jun 4, 2024Updated 2 years ago
- Efficient retrieval head analysis with triton flash attention that supports topK probabilityβ13Jun 15, 2024Updated 2 years ago
- Engine for collecting, uploading, and downloading model activationsβ29Apr 2, 2025Updated last year
- β48Jun 16, 2025Updated last year
- Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamicsβ76Mar 26, 2026Updated 2 months ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A public dataset containing chord/beat annotation from a music game named 'osu!'.β11Oct 17, 2017Updated 8 years ago
- Experiments on the impact of depth in transformers and SSMs.β41Oct 23, 2025Updated 7 months ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidenceβ10Mar 2, 2025Updated last year
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Modelsβ48Jul 17, 2025Updated 11 months ago
- An Empirical Comparison of Unsupervised Constituency Parsing Methodsβ14Aug 15, 2021Updated 4 years ago
- [CVPR 2026] Official repo for "VideoSSR: Video Self-Supervised Reinforcement Learning"β39Nov 11, 2025Updated 7 months ago
- β14Jul 13, 2025Updated 11 months ago
- Code for "AtTGen: Attribute Tree Generation for Real-World Attribute Joint Extraction", ACL 2023β13May 19, 2023Updated 3 years ago
- JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuningβ10Nov 3, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Fully open reproduction of DeepSeek-R1β11Mar 24, 2025Updated last year
- Cairo lua bindings with extensions for torchβ15Jun 12, 2016Updated 10 years ago
- β14Dec 25, 2024Updated last year
- "Learning Rhyming Constraints using Structured Adversaries. Jhamtani H., Mehta S., Carbonell J., Berg-Kirkpatrick T. EMNLP-IJCNLP (Short β¦β11Mar 17, 2020Updated 6 years ago
- uncover old chinese textual parallels based on soundβ16Updated this week
- Bilingual lexicons map words in one language to their translations in another, and are typically induced by learning linear projectβ¦β19Jun 1, 2021Updated 5 years ago
- Deep Learning Model for Stylebank with Pytorchβ10Nov 15, 2019Updated 6 years ago