tamangmilan / llama3
Building Llama 3 from scratch using PyTorch
☆10Updated 6 months ago
Alternatives and similar repositories for llama3:
Users that are interested in llama3 are comparing it to the libraries listed below
- LLaMA 2 implemented from scratch in PyTorch☆299Updated last year
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆264Updated last month
- Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation: https://www.youtube.com/watch?v=vAmKB7iPkWw☆411Updated 3 months ago
- ☆104Updated 3 months ago
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆62Updated last year
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆286Updated 10 months ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆392Updated 4 months ago
- LoRA and DoRA from Scratch Implementations☆198Updated last year
- Implementation of FlashAttention in PyTorch☆136Updated last month
- First-principle implementations of groundbreaking AI algorithms using a wide range of deep learning frameworks, accompanied by supporting…☆140Updated last week
- ☆130Updated 2 months ago
- Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.☆78Updated last year
- Efficient LLM Inference over Long Sequences☆362Updated 3 weeks ago
- Notes and commented code for RLHF (PPO)☆72Updated last year
- pytorch distribute tutorials☆114Updated last week
- Official repository for ORPO☆443Updated 9 months ago
- Notes about LLaMA 2 model☆53Updated last year
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning☆348Updated 6 months ago
- Implementation of DoRA☆290Updated 9 months ago
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆51Updated 6 months ago
- Large Reasoning Models☆800Updated 3 months ago
- 使用单个24G显卡,从0开始训练LLM☆50Updated 4 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆155Updated this week
- A family of compressed models obtained via pruning and knowledge distillation☆328Updated 3 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆196Updated 2 months ago