tamangmilan / llama3

Building Llama 3 from scratch using PyTorch

☆12

Alternatives and similar repositories for llama3

Users that are interested in llama3 are comparing it to the libraries listed below

Sorting:

chunhuizhang / pytorch_distribute_tutorials
pytorch distribute tutorials
☆131Updated last week
shyoulala / LMSYS_BlackPearl
☆68Updated 8 months ago
akaihaoshuai / baby-llama2-chinese_cybertron
使用单个24G显卡，从0开始训练LLM
☆53Updated this week
hkproj / pytorch-llama
LLaMA 2 implemented from scratch in PyTorch
☆324Updated last year
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆165Updated this week
Mxoder / LLM-from-scratch
一些 LLM 方面的从零复现笔记
☆192Updated 2 weeks ago
enze5088 / ChineseModernBert
中文预训练ModernBert
☆45Updated last month
rasbt / dora-from-scratch
LoRA and DoRA from Scratch Implementations
☆202Updated last year
BlackPearl-Lab / KddCup-2024-OAG-Challenge-1st-Solutions
☆172Updated 10 months ago
aju22 / LLaMA2
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…
☆65Updated last year
hengjiUSTC / learn-llm
☆108Updated 6 months ago
fxmeng / TransMLA
TransMLA: Multi-Head Latent Attention Is All You Need
☆268Updated this week
shreyansh26 / FlashAttention-PyTorch
Implementation of FlashAttention in PyTorch
☆146Updated 4 months ago
lucidrains / native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
☆621Updated last month
FlagAI-Open / OpenSeek
OpenSeek aims to unite the global open source community to drive collaborative innovation in algorithms, data and systems to develop next…
☆185Updated last week
sunkx109 / llama
Inference code for LLaMA models
☆120Updated last year
chunhuizhang / bert_t5_gpt
☆70Updated 2 months ago
waylandzhang / DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k
☆77Updated 3 months ago
firechecking / CleanTransformer
an implementation of transformer, bert, gpt, and diffusion models for learning purposes
☆154Updated 7 months ago
dhcode-cpp / NSA-pytorch
DeepSeek Native Sparse Attention pytorch implementation
☆68Updated 2 months ago
zxuu / Self-Attention
Transformer的完整实现。详细构建Encoder、Decoder、Self-attention。以实际例子进行展示，有完整的输入、训练、预测过程。可用于学习理解self-attention和Transformer
☆79Updated last month
preacher-1 / MLA_tutorial
from MHA, MQA, GQA to MLA by 苏剑林, with code
☆18Updated 2 months ago
intro-llm / intro-llm-code
☆148Updated 2 weeks ago
RUC-GSAI / YuLan-Mini
A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.
☆177Updated last week
mobvoi / seq-monkey-data
☆142Updated 11 months ago
AI-Study-Han / Zero-Chatgpt
从0开始，将chatgpt的技术路线跑一遍。
☆233Updated 8 months ago
joey00072 / nanoGRPO
nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)
☆104Updated last week
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆142Updated 2 months ago
open-thought / tiny-grpo
Minimal hackable GRPO implementation
☆225Updated 3 months ago
zysNLP / quickllm
A repo for update and debug Mixtral-7x8B、MOE、ChatGLM3、LLaMa2、 BaChuan、Qwen an other LLM models include new models mixtral, mixtral 8x7b, …
☆44Updated this week