kabir2505 / tiny-mixtralLinks
☆43Updated 3 months ago
Alternatives and similar repositories for tiny-mixtral
Users that are interested in tiny-mixtral are comparing it to the libraries listed below
Sorting:
- ☆46Updated 4 months ago
- minimal GRPO implementation from scratch☆94Updated 4 months ago
- An extension of the nanoGPT repository for training small MOE models.☆164Updated 4 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆68Updated 3 months ago
- working implimention of deepseek MLA☆42Updated 6 months ago
- making the official triton tutorials actually comprehensible☆53Updated 2 weeks ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆188Updated 2 months ago
- ☆184Updated 7 months ago
- An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning☆34Updated 2 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 4 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆55Updated 6 months ago
- Low memory full parameter finetuning of LLMs☆52Updated 2 weeks ago
- rl from zero pretrain, can it be done? we'll see.☆66Updated 2 weeks ago
- NanoGPT (124M) quality in 2.67B tokens☆28Updated last month
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆103Updated 5 months ago
- GPU Kernels☆191Updated 3 months ago
- Collection of autoregressive model implementation☆86Updated 3 months ago
- ☆48Updated 11 months ago
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆64Updated 2 months ago
- A collection of tricks and tools to speed up transformer models☆169Updated 2 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆82Updated 2 months ago
- ☆43Updated 2 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆113Updated 2 months ago
- Memory optimized Mixture of Experts☆51Updated last week
- ☆59Updated last week
- ☆206Updated 5 months ago
- Implementation of a GPT-4o like Multimodal from Scratch using Python☆69Updated 4 months ago
- Building LLaMA 4 MoE from Scratch☆60Updated 3 months ago
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆29Updated 5 months ago
- Implementations of Papers that I read, you can read my breakdown in my blog☆78Updated 2 weeks ago