Step by step explanation/tutorial of llama2.c
☆233Oct 9, 2023Updated 2 years ago
Alternatives and similar repositories for llama2.c-for-dummies
Users that are interested in llama2.c-for-dummies are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- Inference Llama 2 in one file of pure C☆19,548Aug 6, 2024Updated last year
- minimal C implementation of speculative decoding based on llama2.c☆30Jul 15, 2024Updated last year
- ☆14Mar 28, 2014Updated 12 years ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Oct 9, 2025Updated 7 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Efficient Finetuning for OpenAI GPT-OSS☆24Oct 2, 2025Updated 7 months ago
- llama INT4 cuda inference with AWQ☆54Jan 20, 2025Updated last year
- A tool for manual conversion of BGE-M3 models with preserved trainable variables and direct control over model outputs.☆44Sep 7, 2025Updated 8 months ago
- Inference Llama 2 in one file of pure JavaScript(HTML)☆36May 20, 2025Updated last year
- Inference Llama 2 in one file of pure Cuda☆17Aug 20, 2023Updated 2 years ago
- 2023년 고려대학교 MatKor 스터디 - Rust 기초 프로그래밍 + 인터프리터 만들기☆345Aug 10, 2023Updated 2 years ago
- ☆15Apr 26, 2025Updated last year
- Optimizing the Deployment of Tiny Transformers on Low-Power MCUs☆36Sep 2, 2024Updated last year
- Inference Llama/Llama2/Llama3 Modes in NumPy☆20Nov 22, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Adds timm pretrained backbone to pytorch's FasterRcnn model☆12Jan 25, 2024Updated 2 years ago
- LLM as a Chatbot Service☆17Aug 28, 2023Updated 2 years ago
- KoAlpaca: 한국어 명령어를 이해하는 오픈소스 언어모델 (KoAlpaca: An open-source language model to understand Korean instructions)☆1,577Oct 25, 2024Updated last year
- PyTorch implementation of Language model compression with weighted low-rank factorization☆14Jun 28, 2023Updated 2 years ago
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆37Sep 15, 2023Updated 2 years ago
- Simple HTTP serving for PyTorch 🚀☆10Oct 15, 2020Updated 5 years ago
- Mixed precision training from scratch with Tensors and CUDA☆30May 14, 2024Updated 2 years ago
- Inference Llama 2 in one file of pure 🔥☆2,122Feb 9, 2026Updated 3 months ago
- Implementation for IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Feb 22, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- 수능 국어 1등급에 도전하는 AI☆531Apr 2, 2026Updated last month
- Accelerate multihead attention transformer model using HLS for FPGA☆12Dec 7, 2023Updated 2 years ago
- ☆12Sep 1, 2023Updated 2 years ago
- ☆18Nov 9, 2017Updated 8 years ago
- Korean SAT leader board☆169Nov 20, 2025Updated 6 months ago
- Inference of Mamba, Mamba2 and Mamba3 models in pure C☆201Mar 18, 2026Updated 2 months ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆45Feb 27, 2025Updated last year
- 一个用Apple Metal实现的Llama和通义千问大模型本地推理☆10Apr 26, 2024Updated 2 years ago
- An interactive storybook built with the help of ChatGPT and Stable Diffusion.☆13Jun 28, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆12Aug 19, 2023Updated 2 years ago
- Diffusion-based korean text-to-image generation model☆12Aug 16, 2023Updated 2 years ago
- Some microbenchmarks and design docs before commencement☆11Feb 1, 2021Updated 5 years ago
- Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Ad…☆6,081Jul 1, 2025Updated 10 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆599May 13, 2026Updated last week
- ☆20Apr 25, 2021Updated 5 years ago
- Hacks for PyTorch☆19Apr 18, 2023Updated 3 years ago