viai957 / llama-inference

A simple implementation of Llama 1, 2. Llama Architecture built from scratch using PyTorch all the models are built from scratch that includes GQA (Grouped Query Attention) , RoPE (Rotary Positional Embeddings) , RMS Norm, FeedForward Block, Encoder (as this is only for Inferencing the model) , SwiGLU (Activation Function),
13Updated 11 months ago

Alternatives and similar repositories for llama-inference:

Users that are interested in llama-inference are comparing it to the libraries listed below