FareedKhan-dev / train-llama4Links

Building LLaMA 4 MoE from Scratch

☆68

Alternatives and similar repositories for train-llama4

Users that are interested in train-llama4 are comparing it to the libraries listed below

Sorting:

FareedKhan-dev / gpt4o-from-scratch
Implementation of a GPT-4o like Multimodal from Scratch using Python
☆73Updated 7 months ago
kabir2505 / tiny-mixtral
☆45Updated 6 months ago
alexander-moore / vlm
Composition of Multimodal Language Models From Scratch
☆15Updated last year
fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆99Updated 8 months ago
FareedKhan-dev / create-million-parameter-llm-from-scratch
Building a 2.3M-parameter LLM from scratch with LLaMA 1 architecture.
☆191Updated last year
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆96Updated 11 months ago
FareedKhan-dev / rag-with-rl
Maximizing the Performance of a Simple RAG using RL
☆83Updated 8 months ago
AviSoori1x / seemore
From scratch implementation of a vision language model in pure PyTorch
☆248Updated last year
ariG23498 / gemma3-object-detection
Fine tune Gemma 3 on an object detection task
☆88Updated 4 months ago
Jaykef / ai-algorithms
First-principle implementations of groundbreaking AI algorithms using a wide range of deep learning frameworks, accompanied by supporting…
☆179Updated 3 months ago
ali-bahrainian / RAG_best_practices
☆98Updated 7 months ago
unslothai / unsloth-zoo
Utils for Unsloth https://github.com/unslothai/unsloth
☆171Updated this week
FareedKhan-dev / Building-llama3-from-scratch
LLaMA 3 is one of the most promising open-source model after Mistral, we will recreate it's architecture in a simpler manner.
☆190Updated last year
FareedKhan-dev / text2video-from-scratch
A Straightforward, Step-by-Step Implementation of a Video Diffusion Model
☆65Updated 3 months ago
ALucek / GRPO-Training
An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning
☆37Updated 6 months ago
alperiox / Compact-Language-Models-via-Pruning-and-Knowledge-Distillation
Unofficial implementation of https://arxiv.org/pdf/2407.14679
☆50Updated last year
GaiZhenbiao / Phi3V-Finetuning
Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.
☆58Updated last year
shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆305Updated last month
AlexBodner / How_Much_VRAM
☆102Updated last year
jshuadvd / LongRoPE
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
☆152Updated last year
agokrani / distillKitPlus
Easy to use, High Performant Knowledge Distillation for LLMs
☆95Updated 6 months ago
Mayankpratapsingh022 / DeepSeek-from-Scratch
☆55Updated 4 months ago
mingyin0312 / RL4LLM
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆31Updated 8 months ago
mallik3006 / LLM_fine_tuning_llama3_8b
Fine-Tuning Llama3-8B LLM in a multi-GPU environment using DeepSpeed
☆19Updated last year
joey00072 / nanoGRPO
nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)
☆125Updated 6 months ago
Abonia1 / Fine-Tuning-LLMs-Key-Concepts-and-Terms
Fine-tuning large language models (LLMs) is crucial for enhancing performance across domain-specific task applications. This comprehensiv…
☆12Updated last year
wjbmattingly / qwen2-vl-finetune-huggingface
This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.
☆77Updated 4 months ago
woct0rdho / transformers-qwen3-moe-fused
Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth
☆207Updated 2 weeks ago
huggingface / large-scale-image-deduplication
☆173Updated 4 months ago
NVIDIA / workbench-llamafactory
This is an NVIDIA AI Workbench example project that demonstrates an end-to-end model development workflow using Llamafactory.
☆67Updated last year