angry-kratos / Simple_Llama3_from_scratchLinks
☆30Updated last year
Alternatives and similar repositories for Simple_Llama3_from_scratch
Users that are interested in Simple_Llama3_from_scratch are comparing it to the libraries listed below
Sorting:
- ☆45Updated 4 months ago
- ☆45Updated 5 months ago
- Collection of autoregressive model implementation☆86Updated 5 months ago
- ☆46Updated 6 months ago
- Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆38Updated 11 months ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆96Updated 9 months ago
- Notebooks for fine tuning pali gemma☆117Updated 5 months ago
- Quantization of LLMs and benchmarking.☆10Updated last year
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆68Updated 4 months ago
- Implements Low-Rank Adaptation(LoRA) Finetuning from scratch☆80Updated 2 years ago
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆120Updated last year
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆30Updated 7 months ago
- From scratch implementation of a vision language model in pure PyTorch☆243Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆59Updated last year
- Complete implementation of Llama2 with/without KV cache & inference 🚀☆48Updated last year
- Prune transformer layers☆69Updated last year
- This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…☆92Updated 2 years ago
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆52Updated last year
- Simple repository for training small reasoning models☆40Updated 8 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆121Updated 5 months ago
- Fine tune Gemma 3 on an object detection task☆85Updated 2 months ago
- ☆136Updated last year
- A collection of lightweight interpretability scripts to understand how LLMs think☆56Updated 2 weeks ago
- LoRA and DoRA from Scratch Implementations☆211Updated last year
- RAGs: Simple implementations of Retrieval Augmented Generation (RAG) Systems☆133Updated 8 months ago
- minimal GRPO implementation from scratch☆98Updated 6 months ago
- Distributed training (multi-node) of a Transformer model☆84Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyers☆72Updated 5 months ago
- ☆88Updated last year
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Updated 7 months ago