Step by step explanation/tutorial of llama2.c
☆229Oct 9, 2023Updated 2 years ago
Alternatives and similar repositories for llama2.c-for-dummies
Users that are interested in llama2.c-for-dummies are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- Inference Llama 2 in one file of pure C☆19,379Aug 6, 2024Updated last year
- minimal C implementation of speculative decoding based on llama2.c☆29Jul 15, 2024Updated last year
- A tool for manual conversion of BGE-M3 models with preserved trainable variables and direct control over model outputs.☆44Sep 7, 2025Updated 7 months ago
- ☆14Mar 28, 2014Updated 12 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Efficient Finetuning for OpenAI GPT-OSS☆23Oct 2, 2025Updated 6 months ago
- llama INT4 cuda inference with AWQ☆54Jan 20, 2025Updated last year
- Inference Llama 2 in one file of pure JavaScript(HTML)☆36May 20, 2025Updated 10 months ago
- Inference Llama 2 in one file of pure Cuda☆17Aug 20, 2023Updated 2 years ago
- Inference Llama 2 in one file of pure C++☆86Aug 4, 2023Updated 2 years ago
- Fast and slim Javascript implementation of AES in ECB and CTR modes☆15May 13, 2025Updated 11 months ago
- ☆15Apr 26, 2025Updated 11 months ago
- Inference Llama 2 in one file of pure Python☆425Nov 21, 2025Updated 4 months ago
- LLM as a Chatbot Service☆17Aug 28, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Inference Llama/Llama2/Llama3 Modes in NumPy☆21Nov 22, 2023Updated 2 years ago
- Port of GGML to C#☆13Jul 1, 2023Updated 2 years ago
- Demonstration of a factory pattern where the types automatically register themselves☆13Mar 13, 2019Updated 7 years ago
- Telegram chatbot for ChatGPT that can be used personally☆11Apr 18, 2023Updated 2 years ago
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆36Sep 15, 2023Updated 2 years ago
- ☆19Sep 16, 2025Updated 6 months ago
- Mixed precision training from scratch with Tensors and CUDA☆29May 14, 2024Updated last year
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆74Oct 1, 2023Updated 2 years ago
- Inference Llama 2 in one file of pure 🔥☆2,119Feb 9, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Implementation for IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Feb 22, 2026Updated last month
- 수능 국어 1등급에 도전하는 AI☆530Apr 2, 2026Updated last week
- 2019 AI Robotics Korea 1st NLP Study session [DONE]☆10Oct 10, 2019Updated 6 years ago
- ☆12Sep 1, 2023Updated 2 years ago
- A minimal implementation of Gaussian process regression in PyTorch☆65Jan 8, 2023Updated 3 years ago
- ☆11Sep 18, 2023Updated 2 years ago
- Inference of Mamba, Mamba2 and Mamba3 models in pure C☆199Mar 18, 2026Updated 3 weeks ago
- Korean SAT leader board☆168Nov 20, 2025Updated 4 months ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆45Feb 27, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 一个用Apple Metal实现的Llama和通义千问大模型本地推理☆10Apr 26, 2024Updated last year
- ☆12Aug 19, 2023Updated 2 years ago
- 한글을 제대로 지원하는 텍스트 확장기. A text expander that fully supports Hangeul.☆66Feb 8, 2026Updated 2 months ago
- Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Ad…☆6,079Jul 1, 2025Updated 9 months ago
- A collection of instruction data and scripts for machine translation.☆20Sep 23, 2023Updated 2 years ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆599Aug 12, 2025Updated 8 months ago
- ☆30Oct 3, 2022Updated 3 years ago