tamangmilan / llama3
Building Llama 3 from scratch using PyTorch
☆12Updated 8 months ago
Alternatives and similar repositories for llama3
Users that are interested in llama3 are comparing it to the libraries listed below
Sorting:
- pytorch distribute tutorials☆131Updated last week
- ☆68Updated 8 months ago
- 使用单个24G显卡,从0开始训练LLM☆53Updated this week
- LLaMA 2 implemented from scratch in PyTorch☆324Updated last year
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆165Updated this week
- 一些 LLM 方面的从零复现笔记☆192Updated 2 weeks ago
- 中文预训练ModernBert☆45Updated last month
- LoRA and DoRA from Scratch Implementations☆202Updated last year
- ☆172Updated 10 months ago
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆65Updated last year
- ☆108Updated 6 months ago
- TransMLA: Multi-Head Latent Attention Is All You Need☆268Updated this week
- Implementation of FlashAttention in PyTorch☆146Updated 4 months ago
- Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper☆621Updated last month
- OpenSeek aims to unite the global open source community to drive collaborative innovation in algorithms, data and systems to develop next…☆185Updated last week
- Inference code for LLaMA models☆120Updated last year
- ☆70Updated 2 months ago
- ☆77Updated 3 months ago
- an implementation of transformer, bert, gpt, and diffusion models for learning purposes☆154Updated 7 months ago
- DeepSeek Native Sparse Attention pytorch implementation☆68Updated 2 months ago
- Transformer的完整实现。详细构建Encoder、Decoder、Self-attention。以实际例子进行展示,有完整的输入、训练、预测过程。可用于学习理解self-attention和Transformer☆79Updated last month
- from MHA, MQA, GQA to MLA by 苏剑林, with code☆18Updated 2 months ago
- ☆148Updated 2 weeks ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆177Updated last week
- ☆142Updated 11 months ago
- 从0开始,将chatgpt的技术路线跑一遍。☆233Updated 8 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆104Updated last week
- An extension of the nanoGPT repository for training small MOE models.☆142Updated 2 months ago
- Minimal hackable GRPO implementation☆225Updated 3 months ago
- A repo for update and debug Mixtral-7x8B、MOE、ChatGLM3、LLaMa2、 BaChuan、Qwen an other LLM models include new models mixtral, mixtral 8x7b, …☆44Updated this week