Teacher - student distillation using DeepSpeed
β19Oct 7, 2022Updated 3 years ago
Alternatives and similar repositories for distill-bloom-deepspeed
Users that are interested in distill-bloom-deepspeed are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of the algorithm described in "Multi-sentence compression: Finding shortest paths in word graphs" by Katja Filippova.β12Apr 27, 2015Updated 11 years ago
- Testing DeepSpeed integration in π€ Accelerateβ11Jun 28, 2022Updated 3 years ago
- Techniques used to run BLOOM at inference in parallelβ37Oct 21, 2022Updated 3 years ago
- [ICLR 2025] No Preference Left Behind: Group Distributional Preference Optimizationβ16Apr 21, 2025Updated last year
- Few-Shot Preference Optimization (FSPO) personalizes LLMs by reframing reward modeling as a meta-learning problem, enabling rapid adaptatβ¦β16Feb 27, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Efficient Finetuning for OpenAI GPT-OSSβ24Oct 2, 2025Updated 7 months ago
- Contains the code for my Imperial College London Master's thesis on text summarizationβ10Oct 25, 2022Updated 3 years ago
- Train your own GPT2!β14Apr 11, 2023Updated 3 years ago
- Directed masked autoencodersβ15Mar 25, 2026Updated 2 months ago
- C++17 implementation of einops for libtorch - clear and reliable tensor manipulations with einstein-like notationβ11Oct 16, 2023Updated 2 years ago
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"β26Jul 21, 2025Updated 10 months ago
- Code to reproduce results of our experiments using LoReβ17Apr 8, 2026Updated last month
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"β17Mar 31, 2025Updated last year
- Search Google from your terminal.β37Nov 6, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Making of cuda kernelβ16May 27, 2025Updated last year
- β23Aug 27, 2025Updated 9 months ago
- Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration πβ117Oct 27, 2022Updated 3 years ago
- λ무μν€λ€νμμ μ μ λ ν μ€νΈλ₯Ό μ»κΈ° μν NamuwikiExtractorβ19Feb 27, 2022Updated 4 years ago
- β17Oct 30, 2022Updated 3 years ago
- How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions?β13Aug 16, 2023Updated 2 years ago
- β16Mar 12, 2024Updated 2 years ago
- A repository of themes for https://github.com/liamg/darktileβ10Jul 30, 2021Updated 4 years ago
- Make Agent CLI is a powerful command-line tool designed to streamline the management and deployment of AI agents across multiple chains. β¦β15Sep 3, 2025Updated 8 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for the paper "CoS: Enhancing Personalization and Mitigating Bias with Context Steering"β20Dec 13, 2024Updated last year
- Artifact for "DX100: A Programmable Data Access Accelerator for Indirection (ISCA 2025)" paperβ18Nov 6, 2025Updated 6 months ago
- A new DRAM substrate that mitigates the excessive energy consumption from both (i) transmitting unused data on the memory channel and (iβ¦β14Aug 23, 2024Updated last year
- β12Oct 1, 2025Updated 7 months ago
- Structured argument extraction for Koreanβ22Feb 17, 2022Updated 4 years ago
- takahe is a multi-sentence compression moduleβ54Jun 17, 2021Updated 4 years ago
- Verilog code for a low power RFID chip that will communicate with I2C sensors.β13Apr 18, 2014Updated 12 years ago
- Calculating FLOPs of Pre-trained Models in NLPβ18Mar 29, 2021Updated 5 years ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for trainingβ18Dec 19, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for "Practical Low-Rank Communication Compression in Decentralized Deep Learning"β17Aug 4, 2020Updated 5 years ago
- β16Sep 4, 2025Updated 8 months ago
- Self-Supervised Speech Pre-training and Representation Learning Toolkit.β10Feb 29, 2024Updated 2 years ago
- arXiv submission related tool repositoryβ15May 19, 2026Updated last week
- Code for paper: Unraveling the Shift of Visual Information Flow in MLLMs: From Phased Interaction to Efficient Inferenceβ14Jun 7, 2025Updated 11 months ago
- β18Oct 8, 2024Updated last year
- Yangon Township GeoJSON Dataβ11Jun 10, 2015Updated 10 years ago