β33Dec 31, 2025Updated 4 months ago
Alternatives and similar repositories for hybrid-distillation
Users that are interested in hybrid-distillation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β248Nov 19, 2025Updated 5 months ago
- π₯ A minimal training framework for scaling FLA modelsβ385Apr 22, 2026Updated 2 weeks ago
- Use the tokenizer in parallel to achieve superior accelerationβ20Mar 21, 2024Updated 2 years ago
- β59Jul 9, 2024Updated last year
- β70Jul 8, 2025Updated 10 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICLR 2026] GRAPE: Group Representational Position Encoding (https://arxiv.org/abs/2512.07805)β86Apr 1, 2026Updated last month
- Stick-breaking attentionβ63Jul 1, 2025Updated 10 months ago
- β136Jun 6, 2025Updated 11 months ago
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruningβ150Feb 25, 2026Updated 2 months ago
- ACL 2026 & NAACL 2025: Bridging Retrieval and Inference through Evidence Fusionβ13Apr 9, 2026Updated last month
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling withoutβ¦β22Mar 15, 2025Updated last year
- Code and data for paper "(How) do Language Models Track State?"β22Mar 31, 2025Updated last year
- [CVPR 2026 Highlight] Official implementation of Log-linear Sparse Attention (LLSA).β70May 1, 2026Updated last week
- Class materials, homeworks and videos for probation preparation.β24Feb 3, 2026Updated 3 months ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- β12Jan 29, 2021Updated 5 years ago
- code for paper "Accessing higher dimensions for unsupervised word translation"β22Jun 26, 2023Updated 2 years ago
- Official Code Repository for the paper "Key-value memory in the brain"β31Feb 25, 2025Updated last year
- Cross-lingual learning in scene text recognition (ICASSP2024)β18Sep 29, 2024Updated last year
- This is the official implementation for paper "On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond".β21Nov 17, 2025Updated 5 months ago
- Engine for collecting, uploading, and downloading model activationsβ28Apr 2, 2025Updated last year
- Linear Attention Sequence Parallelism (LASP)β88Jun 4, 2024Updated last year
- Efficient retrieval head analysis with triton flash attention that supports topK probabilityβ13Jun 15, 2024Updated last year
- Reproducing R1 for Code with Reliable Rewardsβ12Apr 9, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- AES - Ancient Egyptian Sentences; Corpus of Ancient Egyptian sentences for corpus-linguistic researchβ10May 18, 2021Updated 4 years ago
- Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamicsβ73Mar 26, 2026Updated last month
- β48Jun 16, 2025Updated 10 months ago
- Experiments on the impact of depth in transformers and SSMs.β40Oct 23, 2025Updated 6 months ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidenceβ10Mar 2, 2025Updated last year
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Modelsβ46Jul 17, 2025Updated 9 months ago
- An Empirical Comparison of Unsupervised Constituency Parsing Methodsβ14Aug 15, 2021Updated 4 years ago
- [CVPR 2026] Official repo for "VideoSSR: Video Self-Supervised Reinforcement Learning"β37Nov 11, 2025Updated 5 months ago
- β14Jul 13, 2025Updated 9 months ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuningβ10Nov 3, 2024Updated last year
- Fully open reproduction of DeepSeek-R1β11Mar 24, 2025Updated last year
- Cairo lua bindings with extensions for torchβ15Jun 12, 2016Updated 9 years ago
- β14Dec 25, 2024Updated last year
- Repository for the deep-learning framework DIVA-DAF which is build with historical document image analysis in mind.β18Nov 7, 2024Updated last year
- "Learning Rhyming Constraints using Structured Adversaries. Jhamtani H., Mehta S., Carbonell J., Berg-Kirkpatrick T. EMNLP-IJCNLP (Short β¦β11Mar 17, 2020Updated 6 years ago
- [ICLR 2025] No Preference Left Behind: Group Distributional Preference Optimizationβ15Apr 21, 2025Updated last year