viai957 / llama-inferenceLinks
A simple implementation of Llama 1, 2. Llama Architecture built from scratch using PyTorch all the models are built from scratch that includes GQA (Grouped Query Attention) , RoPE (Rotary Positional Embeddings) , RMS Norm, FeedForward Block, Encoder (as this is only for Inferencing the model) , SwiGLU (Activation Function),
☆13Updated last year
Alternatives and similar repositories for llama-inference
Users that are interested in llama-inference are comparing it to the libraries listed below
Sorting:
- Step by step explanation/tutorial of llama2.c☆224Updated 2 years ago
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆343Updated 5 months ago
- 1-Click is all you need.☆62Updated last year
- Google TPU optimizations for transformers models☆120Updated 8 months ago
- ☆68Updated last year
- Fine-Tuning Llama3-8B LLM in a multi-GPU environment using DeepSpeed☆18Updated last year
- showing various ways to serve Keras based stable diffusion☆111Updated 2 years ago
- A hackable, simple, and reseach-friendly GRPO Training Framework with high speed weight synchronization in a multinode environment.☆31Updated last month
- Sakura-SOLAR-DPO: Merge, SFT, and DPO☆116Updated last year
- Easy and Efficient Quantization for Transformers☆203Updated 3 months ago
- Simple Adaptation of BitNet☆32Updated last year
- Pytorch/XLA SPMD Test code in Google TPU☆23Updated last year
- 삼각형의 실전! Triton☆16Updated last year
- This project shows how to serve an TF based image classification model as a web service with TFServing, Docker, and Kubernetes(GKE).☆125Updated 3 years ago
- LoRA and DoRA from Scratch Implementations☆211Updated last year
- ☆52Updated 11 months ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆87Updated last year
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆194Updated 4 months ago
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆72Updated 2 years ago
- GPT2 fine-tuning pipeline with KerasNLP, TensorFlow, and TensorFlow Extended☆33Updated 2 years ago
- Learn CUDA with PyTorch☆87Updated 3 weeks ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆231Updated 11 months ago
- Experimenting with small language models☆73Updated last year
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆160Updated last year
- ☆45Updated 5 months ago
- Implementation of DreamBooth in KerasCV and TensorFlow.☆88Updated 2 years ago
- Inference Llama/Llama2/Llama3 Modes in NumPy☆21Updated last year
- Large-scale language modeling tutorials with PyTorch☆290Updated 3 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated last year
- A PyTorch Implementation of "Attention Is All You Need"☆38Updated 4 years ago