A simple implementation of Llama 1, 2. Llama Architecture built from scratch using PyTorch all the models are built from scratch that includes GQA (Grouped Query Attention) , RoPE (Rotary Positional Embeddings) , RMS Norm, FeedForward Block, Encoder (as this is only for Inferencing the model) , SwiGLU (Activation Function),
☆14May 6, 2024Updated last year
Alternatives and similar repositories for llama-inference
Users that are interested in llama-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A curated collection of prompts for Grok Imagine by xAI☆25Oct 19, 2025Updated 5 months ago
- ☆12Dec 14, 2024Updated last year
- SpeechPlus: Small LLM-Based Text-to-Speech Library 🚀☆20May 20, 2025Updated 10 months ago
- code for Preprint paper at Arxiv: MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve with Memory-of-Thoughts☆24Nov 29, 2023Updated 2 years ago
- Python client for Jikan.moe, MyAnimeList unofficial API with good intentions.☆14Dec 20, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Files used for the evaluation of uiCA☆18Dec 14, 2022Updated 3 years ago
- Conformer block with Rotary Position Embedding, modified from lucidrains' implement☆18Sep 13, 2024Updated last year
- ☆13Sep 12, 2024Updated last year
- yolosegment2labelme - a Python package that allows you to convert YOLO segmentation prediction results to LabelMe and anylabeling JSON fo…☆10May 8, 2024Updated last year
- A PyTorch implementation of Vector Quantized Variational Autoencoder (VQ-VAE) with EMA updates, pretrained encoder, and K-means initializ…☆21Updated this week
- This repo implements Video generation model using Latent Diffusion Transformers(Latte) in PyTorch and provides training and inference cod…☆17Jan 6, 2025Updated last year
- A replication of the paper "Adaptive Mixtures of Local Experts" applied to the CIFAR-10 image classification dataset.☆12Mar 19, 2021Updated 5 years ago
- A framework to find good combinations of optimizations for computational kernels on GPUs.☆26Nov 30, 2020Updated 5 years ago
- Realizing private and practical pharmacological collaboration☆15Oct 19, 2018Updated 7 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- An HTTP Server for FPGAs☆16Sep 26, 2023Updated 2 years ago
- Course Project for COMP4471 on RWKV☆17Feb 11, 2024Updated 2 years ago
- Environment equipped with reinforcement learning algorithms to train agents to play tic-tac-toe.☆13Mar 4, 2023Updated 3 years ago
- Simple stacktrace analysis tool for the JVM☆24Sep 8, 2017Updated 8 years ago
- Efficient Finetuning for OpenAI GPT-OSS☆23Oct 2, 2025Updated 5 months ago
- An implementation of the base GPT-3 Model architecture from the paper by OPENAI "Language Models are Few-Shot Learners"☆20Jun 29, 2024Updated last year
- An open-source translation agent designed to enhance the quality of text translations by leveraging large language models☆24Updated this week
- Simplistic Implementation of Zipformer:A faster and better encoder for automatic speech recognition in PyTorch☆19Jun 3, 2024Updated last year
- A straightforward explanation of how DeepSeek R1 works☆18Feb 7, 2025Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Multi-Agent AI App from Scratch in python without any depedency of framework☆15Jan 7, 2025Updated last year
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- ULX5M with GateMate with SDRAM☆50Mar 13, 2026Updated 2 weeks ago
- Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch☆29Mar 22, 2026Updated last week
- Fine-tuning large language models (LLMs) is crucial for enhancing performance across domain-specific task applications. This comprehensiv…☆13Sep 19, 2024Updated last year
- This repo implements and trains DallE-1 on a synthetically generated dataset which has colored mnist images on texture/solid background a…☆13Oct 30, 2024Updated last year
- Trained a 114 million Parameter LLM from Scratch.☆19Jul 21, 2024Updated last year
- A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks☆36Oct 31, 2024Updated last year
- Automatically create Anki cards using OpenAI's API in the terminal.☆16Jun 6, 2023Updated 2 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Jax/Flax implementation of Denoising Diffusion Implicit Models☆20Jul 18, 2022Updated 3 years ago
- Ready to use whisper.cpp models implementation for iOS and Android☆24Sep 4, 2023Updated 2 years ago
- This Streamlit application creates an interactive Data Visualization Assistant that can understand Natural Language Queries and generate …☆18Jan 13, 2025Updated last year
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Mar 23, 2026Updated last week
- a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.☆24Jul 6, 2024Updated last year
- Conformer RNN-Transducer☆14May 25, 2022Updated 3 years ago
- Inference Llama/Llama2/Llama3 Modes in NumPy☆21Nov 22, 2023Updated 2 years ago