β33Dec 31, 2025Updated 4 months ago
Alternatives and similar repositories for hybrid-distillation
Users that are interested in hybrid-distillation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β248Nov 19, 2025Updated 6 months ago
- π₯ A minimal training framework for scaling FLA modelsβ389Apr 22, 2026Updated last month
- β60Jul 9, 2024Updated last year
- β70Jul 8, 2025Updated 10 months ago
- [ICLR 2026] GRAPE: Group Representational Position Encoding (https://arxiv.org/abs/2512.07805)β91May 13, 2026Updated 2 weeks ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Stick-breaking attentionβ63Jul 1, 2025Updated 10 months ago
- β137Jun 6, 2025Updated 11 months ago
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling withoutβ¦β21Mar 15, 2025Updated last year
- [CVPR 2026 Highlight] Official implementation of Log-linear Sparse Attention (LLSA).β77May 1, 2026Updated 3 weeks ago
- Open-source toolkit for training, Priming, and serving next generation Hybrid architecturesβ70May 9, 2026Updated 2 weeks ago
- β12Jan 29, 2021Updated 5 years ago
- code for paper "Accessing higher dimensions for unsupervised word translation"β22Jun 26, 2023Updated 2 years ago
- Source code and dataset for the paper 'Saamayik: A Benchmark and Dataset for English-Sanskrit Translation'β15Oct 11, 2025Updated 7 months ago
- β45Nov 1, 2025Updated 6 months ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- This is the official implementation for paper "On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond".β22Nov 17, 2025Updated 6 months ago
- Linear Attention Sequence Parallelism (LASP)β88Jun 4, 2024Updated last year
- Reproducing R1 for Code with Reliable Rewardsβ12Apr 9, 2025Updated last year
- AES - Ancient Egyptian Sentences; Corpus of Ancient Egyptian sentences for corpus-linguistic researchβ10May 18, 2021Updated 5 years ago
- β48Jun 16, 2025Updated 11 months ago
- Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamicsβ76Mar 26, 2026Updated 2 months ago
- Experiments on the impact of depth in transformers and SSMs.β40Oct 23, 2025Updated 7 months ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidenceβ10Mar 2, 2025Updated last year
- CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programsβ152May 22, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Modelsβ48Jul 17, 2025Updated 10 months ago
- An Empirical Comparison of Unsupervised Constituency Parsing Methodsβ14Aug 15, 2021Updated 4 years ago
- [CVPR 2026] Official repo for "VideoSSR: Video Self-Supervised Reinforcement Learning"β38Nov 11, 2025Updated 6 months ago
- β14Jul 13, 2025Updated 10 months ago
- Code for "AtTGen: Attribute Tree Generation for Real-World Attribute Joint Extraction", ACL 2023β13May 19, 2023Updated 3 years ago
- Cairo lua bindings with extensions for torchβ15Jun 12, 2016Updated 9 years ago
- β14Dec 25, 2024Updated last year
- [ICLR 2025] No Preference Left Behind: Group Distributional Preference Optimizationβ16Apr 21, 2025Updated last year
- Filling the Gaps in Ancient Akkadian Texts:A Masked Language Modelling Approach, Lazar et al., EMNLP 2021β14Nov 10, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- uncover old chinese textual parallels based on soundβ16May 21, 2026Updated last week
- Bilingual lexicons map words in one language to their translations in another, and are typically induced by learning linear projectβ¦β18Jun 1, 2021Updated 4 years ago
- Deep Learning Model for Stylebank with Pytorchβ10Nov 15, 2019Updated 6 years ago
- InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Modelsβ107Apr 20, 2026Updated last month
- β19Aug 10, 2024Updated last year
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's lβ¦β57Mar 31, 2026Updated last month
- HGRN2: Gated Linear RNNs with State Expansionβ57Aug 20, 2024Updated last year