viai957 / llama-inferenceLinks

A simple implementation of Llama 1, 2. Llama Architecture built from scratch using PyTorch all the models are built from scratch that includes GQA (Grouped Query Attention) , RoPE (Rotary Positional Embeddings) , RMS Norm, FeedForward Block, Encoder (as this is only for Inferencing the model) , SwiGLU (Activation Function),
13Updated last year

Alternatives and similar repositories for llama-inference

Users that are interested in llama-inference are comparing it to the libraries listed below

Sorting: