abhisheknair10 / Llama3.cu

Lightweight Llama 3 8B Inference Engine in CUDA C

☆36

Alternatives and similar repositories for Llama3.cu:

Users that are interested in Llama3.cu are comparing it to the libraries listed below

iantbutler01 / ditty
A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.
☆16Updated 2 months ago
agokrani / distillKitPlus
Easy to use, High Performant Knowledge Distillation for LLMs
☆35Updated this week
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆52Updated 9 months ago
NolanoOrg / SpectraSuite
☆44Updated 5 months ago
Zyphra / zcookbook
Training hybrid models for dummies.
☆16Updated 3 weeks ago
google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆43Updated last week
Codys12 / airllm
AirLLM 70B inference with single 4GB GPU
☆12Updated 5 months ago
kyegomez / OpenStrawberry
An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO
☆26Updated this week
austinsilveria / tricksy
Fast approximate inference on a single GPU with sparsity aware offloading
☆38Updated last year
shivamsanju / ragswift
🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform
☆37Updated 11 months ago
kenhktsui / anyclassifier
One Line To Build Zero-Data Classifiers in Minutes
☆33Updated 3 months ago
fairydreaming / farel-bench
Testing LLM reasoning abilities with family relationship quizzes.
☆54Updated last week
robbiemu / llama-gguf-optimize
Scripts and tools for optimizing quantizations in llama.cpp with GGUF imatrices.
☆11Updated this week
RichardKelley / hflm
A simple library for working with Hugging Face models.
☆14Updated last week
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆68Updated last month
iamlemec / bert.cpp
GGML implementation of BERT model with Python bindings and quantization.
☆52Updated 10 months ago
gau-nernst / quantized-training
Explore training for quantized models
☆11Updated this week
egozverev / Should-It-Be-Executed-Or-Processed
Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.
☆46Updated 7 months ago
EdwardDali / EntropixLab
entropix style sampling + GUI
☆25Updated 2 months ago
catid / spectral_ssm
Implementation of Spectral State Space Models
☆18Updated 10 months ago
slashml / awesome-finetuning
☆25Updated 4 months ago
perk11 / large-model-proxy
Large Model Proxy is designed to make it easy to run multiple resource-heavy Large Models (LM) on the same machine with limited amount of…
☆48Updated 3 months ago
lukasVierling / FaceRWKV
Course Project for COMP4471 on RWKV
☆16Updated 11 months ago
TheProxyCompany / proxy-structuring-engine
Ensure AI-generated output follows predefined schemas without compromising creativity, speed, or context.
☆16Updated last week
UmerHA / triton_util
Make triton easier
☆42Updated 6 months ago
andrew-silva / mlx-rlhf
An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.
☆22Updated 6 months ago
ahmed-moubtahij / TokenHealer
☆21Updated 7 months ago
AlpinDale / LLM-Shearing
Preprint: Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
☆28Updated 11 months ago
catid / lllm
Latent Large Language Models
☆17Updated 4 months ago
ddh0 / easy-llama
Text generation in Python, as easy as possible
☆47Updated this week