β31Dec 31, 2025Updated 2 months ago
Alternatives and similar repositories for hybrid-distillation
Users that are interested in hybrid-distillation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β240Nov 19, 2025Updated 4 months ago
- π₯ A minimal training framework for scaling FLA modelsβ358Nov 15, 2025Updated 4 months ago
- Use the tokenizer in parallel to achieve superior accelerationβ20Mar 21, 2024Updated 2 years ago
- [ICLR 2026] GRAPE: Group Representational Position Encoding (https://arxiv.org/abs/2512.07805)β82Mar 10, 2026Updated 2 weeks ago
- β68Jul 8, 2025Updated 8 months ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- β133Jun 6, 2025Updated 9 months ago
- Bridging Retrieval and Inference through Evidence Fusionβ13Oct 20, 2025Updated 5 months ago
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruningβ146Feb 25, 2026Updated last month
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling withoutβ¦β21Mar 15, 2025Updated last year
- Official implementation of Log-linear Sparse Attention (LLSA).β64Feb 2, 2026Updated last month
- Code and data for paper "(How) do Language Models Track State?"β22Mar 31, 2025Updated 11 months ago
- β12Nov 3, 2024Updated last year
- Class materials, homeworks and videos for probation preparation.β22Feb 3, 2026Updated last month
- β12Jan 29, 2021Updated 5 years ago
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- code for paper "Accessing higher dimensions for unsupervised word translation"β22Jun 26, 2023Updated 2 years ago
- Official Code Repository for the paper "Key-value memory in the brain"β31Feb 25, 2025Updated last year
- β45Nov 1, 2025Updated 4 months ago
- Cross-lingual learning in scene text recognition (ICASSP2024)β18Sep 29, 2024Updated last year
- Linear Attention Sequence Parallelism (LASP)β88Jun 4, 2024Updated last year
- Engine for collecting, uploading, and downloading model activationsβ27Apr 2, 2025Updated 11 months ago
- Efficient retrieval head analysis with triton flash attention that supports topK probabilityβ13Jun 15, 2024Updated last year
- Reproducing R1 for Code with Reliable Rewardsβ12Apr 9, 2025Updated 11 months ago
- Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamicsβ72Jan 13, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- β13Feb 1, 2024Updated 2 years ago
- AES - Ancient Egyptian Sentences; Corpus of Ancient Egyptian sentences for corpus-linguistic researchβ10May 18, 2021Updated 4 years ago
- A public dataset containing chord/beat annotation from a music game named 'osu!'.β11Oct 17, 2017Updated 8 years ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidenceβ10Mar 2, 2025Updated last year
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Modelsβ46Jul 17, 2025Updated 8 months ago
- β58Jul 9, 2024Updated last year
- An Empirical Comparison of Unsupervised Constituency Parsing Methodsβ14Aug 15, 2021Updated 4 years ago
- β15Jul 13, 2025Updated 8 months ago
- JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuningβ10Nov 3, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- β14Dec 25, 2024Updated last year
- [ICLR 2025] No Preference Left Behind: Group Distributional Preference Optimizationβ15Apr 21, 2025Updated 11 months ago
- "Learning Rhyming Constraints using Structured Adversaries. Jhamtani H., Mehta S., Carbonell J., Berg-Kirkpatrick T. EMNLP-IJCNLP (Short β¦β11Mar 17, 2020Updated 6 years ago
- uncover old chinese textual parallels based on soundβ15Feb 23, 2026Updated last month
- Filling the Gaps in Ancient Akkadian Texts:A Masked Language Modelling Approach, Lazar et al., EMNLP 2021β13Nov 10, 2022Updated 3 years ago
- Bilingual lexicons map words in one language to their translations in another, and are typically induced by learning linear projectβ¦β18Jun 1, 2021Updated 4 years ago
- Deep Learning Model for Stylebank with Pytorchβ10Nov 15, 2019Updated 6 years ago