A simple implementation of Llama 1, 2. Llama Architecture built from scratch using PyTorch all the models are built from scratch that includes GQA (Grouped Query Attention) , RoPE (Rotary Positional Embeddings) , RMS Norm, FeedForward Block, Encoder (as this is only for Inferencing the model) , SwiGLU (Activation Function),
☆14May 6, 2024Updated last year
Alternatives and similar repositories for llama-inference
Users that are interested in llama-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14Apr 14, 2021Updated 5 years ago
- ☆12Dec 14, 2024Updated last year
- Deep learning in time series analysis☆13May 21, 2018Updated 7 years ago
- A barely barebone NumPy implementation of Hierarchical Temporal Memory.☆11Mar 26, 2023Updated 3 years ago
- SpeechPlus: Small LLM-Based Text-to-Speech Library 🚀☆20May 20, 2025Updated 10 months ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A code sample demonstrating how to share and rebuild a PyTorch GPU tensor via its pointer/reference between different processes.☆15Aug 27, 2024Updated last year
- Python client for Jikan.moe, MyAnimeList unofficial API with good intentions.☆14Dec 20, 2022Updated 3 years ago
- Machine Learning algorithms implementation in Python from scratch.☆11Feb 10, 2019Updated 7 years ago
- Files used for the evaluation of uiCA☆18Dec 14, 2022Updated 3 years ago
- OSPO 101 Training Modules☆21Jul 3, 2025Updated 9 months ago
- Conformer block with Rotary Position Embedding, modified from lucidrains' implement☆18Sep 13, 2024Updated last year
- ☆13Sep 12, 2024Updated last year
- Utility of SMI (Secondary Memory Interface) of Raspberry Pi☆12Apr 29, 2017Updated 8 years ago
- A PyTorch implementation of Vector Quantized Variational Autoencoder (VQ-VAE) with EMA updates, pretrained encoder, and K-means initializ…☆21Mar 26, 2026Updated 3 weeks ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- This repo implements Video generation model using Latent Diffusion Transformers(Latte) in PyTorch and provides training and inference cod…☆18Jan 6, 2025Updated last year
- Code Repository for Blog - How to Productionize Large Language Models (LLMs)☆12Mar 27, 2024Updated 2 years ago
- A replication of the paper "Adaptive Mixtures of Local Experts" applied to the CIFAR-10 image classification dataset.☆12Mar 19, 2021Updated 5 years ago
- SYN flood implementation using Boost.Asio☆12Nov 20, 2014Updated 11 years ago
- Course Project for COMP4471 on RWKV☆17Feb 11, 2024Updated 2 years ago
- Environment equipped with reinforcement learning algorithms to train agents to play tic-tac-toe.☆13Mar 4, 2023Updated 3 years ago
- Simple stacktrace analysis tool for the JVM☆24Sep 8, 2017Updated 8 years ago
- Algorithm study using python day by day☆13Apr 9, 2017Updated 9 years ago
- An open-source translation agent designed to enhance the quality of text translations by leveraging large language models☆25Mar 28, 2026Updated 3 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- An implementation of the base GPT-3 Model architecture from the paper by OPENAI "Language Models are Few-Shot Learners"☆20Jun 29, 2024Updated last year
- Efficient and comprehensive pytorch implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis from Mildenh…☆18Nov 7, 2021Updated 4 years ago
- A simple implementation of a deep linear Pytorch module☆21Oct 16, 2020Updated 5 years ago
- A straightforward explanation of how DeepSeek R1 works☆18Feb 7, 2025Updated last year
- Refactoring contents and codes of CS20 : Tensorflow for Deep Learning Research☆62Jan 18, 2019Updated 7 years ago
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- ☆22Sep 26, 2024Updated last year
- Fine-tuning large language models (LLMs) is crucial for enhancing performance across domain-specific task applications. This comprehensiv…☆13Sep 19, 2024Updated last year
- Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch☆29Updated this week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆18Oct 1, 2024Updated last year
- Trained a 114 million Parameter LLM from Scratch.☆19Jul 21, 2024Updated last year
- RAG Based LLM Chatbot Built using Open Source Stack (Llama 3.2 Model, BGE Embeddings, and Qdrant running locally within a Docker Containe…☆19Jan 9, 2025Updated last year
- A Tiny, Pure Python implementation of Gradient Boosted Trees.☆14Dec 28, 2022Updated 3 years ago
- 《GPT-4, ChatGPT, 라마인덱스, 랭체인을 활용한 인공지능 프로그래밍》 예제 코드☆10Jan 16, 2024Updated 2 years ago
- Today I Learned☆21Jan 1, 2025Updated last year
- PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition☆18Apr 25, 2021Updated 4 years ago