Step by step explanation/tutorial of llama2.c
☆226Oct 9, 2023Updated 2 years ago
Alternatives and similar repositories for llama2.c-for-dummies
Users that are interested in llama2.c-for-dummies are comparing it to the libraries listed below
Sorting:
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- Inference Llama 2 in one file of pure C☆19,262Aug 6, 2024Updated last year
- Reinforcement Learning Algorithms☆11Sep 9, 2021Updated 4 years ago
- minimal C implementation of speculative decoding based on llama2.c☆27Jul 15, 2024Updated last year
- ☆14Mar 28, 2014Updated 11 years ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Oct 9, 2025Updated 5 months ago
- Efficient Finetuning for OpenAI GPT-OSS☆23Oct 2, 2025Updated 5 months ago
- llama INT4 cuda inference with AWQ☆53Jan 20, 2025Updated last year
- Inference Llama 2 in one file of pure JavaScript(HTML)☆36May 20, 2025Updated 10 months ago
- Inference Llama 2 in one file of pure Cuda☆16Aug 20, 2023Updated 2 years ago
- ☆15Apr 26, 2025Updated 10 months ago
- Inference Llama 2 in one file of pure Python☆426Nov 21, 2025Updated 3 months ago
- Adds timm pretrained backbone to pytorch's FasterRcnn model☆12Jan 25, 2024Updated 2 years ago
- Port of GGML to C#☆13Jul 1, 2023Updated 2 years ago
- KoAlpaca: 한국어 명령어를 이해하는 오픈소스 언어모델 (KoAlpaca: An open-source language model to understand Korean instructions)☆1,578Oct 25, 2024Updated last year
- PyTorch implementation of Language model compression with weighted low-rank factorization☆13Jun 28, 2023Updated 2 years ago
- Telegram chatbot for ChatGPT that can be used personally☆11Apr 18, 2023Updated 2 years ago
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆34Sep 15, 2023Updated 2 years ago
- ☆19Sep 16, 2025Updated 6 months ago
- Mixed precision training from scratch with Tensors and CUDA☆28May 14, 2024Updated last year
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Feb 22, 2026Updated 3 weeks ago
- AWS SageMaker를 이용한 MLOps와 LLMOps☆31Aug 4, 2023Updated 2 years ago
- 수능 국어 1등급에 도전하는 AI☆532Oct 6, 2024Updated last year
- 2019 AI Robotics Korea 1st NLP Study session [DONE]☆10Oct 10, 2019Updated 6 years ago
- ☆12Sep 1, 2023Updated 2 years ago
- Accelerate multihead attention transformer model using HLS for FPGA☆11Dec 7, 2023Updated 2 years ago
- ☆11Sep 18, 2023Updated 2 years ago
- ☆18Nov 9, 2017Updated 8 years ago
- Inference Llama 2 in one file of pure 🔥☆2,121Feb 9, 2026Updated last month
- Inference of Mamba, Mamba2 and Mamba3 models in pure C☆199Updated this week
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆43Feb 27, 2025Updated last year
- Korean SAT leader board☆167Nov 20, 2025Updated 4 months ago
- Llama 2 Everywhere (L2E)☆1,529Aug 27, 2025Updated 6 months ago
- ☆12Aug 19, 2023Updated 2 years ago
- Diffusion-based korean text-to-image generation model☆12Aug 16, 2023Updated 2 years ago
- 한글을 제대로 지원하는 텍스트 확장기. A text expander that fully supports Hangeul.☆65Feb 8, 2026Updated last month
- Some microbenchmarks and design docs before commencement☆12Feb 1, 2021Updated 5 years ago
- Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Ad…☆6,082Jul 1, 2025Updated 8 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆598Aug 12, 2025Updated 7 months ago