abhisheknair10 / llama3.cu

Lightweight Llama 3 8B Inference Engine in CUDA C

☆43

Alternatives and similar repositories for llama3.cu:

Users that are interested in llama3.cu are comparing it to the libraries listed below

NolanoOrg / SpectraSuite
☆44Updated 6 months ago
agokrani / distillKitPlus
Easy to use, High Performant Knowledge Distillation for LLMs
☆40Updated 2 weeks ago
sebulo / LoQT
☆79Updated 2 months ago
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆34Updated 9 months ago
BorealisAI / neuzip
Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…
☆48Updated 2 months ago
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆133Updated this week
Zyphra / transformers_zamba2
☆42Updated last week
UmerHA / triton_util
Make triton easier
☆44Updated 7 months ago
Zyphra / zcookbook
Training hybrid models for dummies.
☆18Updated 2 weeks ago
Birch-san / booru-embed
[WIP] Transformer to embed Danbooru labelsets
☆13Updated 10 months ago
fairydreaming / farel-bench
Testing LLM reasoning abilities with family relationship quizzes.
☆57Updated this week
google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆104Updated this week
austinsilveria / tricksy
Fast approximate inference on a single GPU with sparsity aware offloading
☆38Updated last year
slashml / awesome-finetuning
☆27Updated 5 months ago
salykova / sgemm.cu
SGEMM that beats cuBLAS
☆68Updated last week
Chillee / llm.c
LLM training in simple, raw C/CUDA
☆18Updated 8 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆73Updated 2 months ago
facebookresearch / dual-system-for-visual-language-reasoning
Github repo for Peifeng's internship project
☆13Updated last year
MNoorFawi / curlora
The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.
☆41Updated 5 months ago
catid / lllm
Latent Large Language Models
☆17Updated 5 months ago
kyegomez / OpenStrawberry
An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO
☆27Updated this week
zaydzuhri / pythia-mlkv
Multi-Layer Key-Value sharing experiments on Pythia models
☆32Updated 7 months ago
nyunAI / PruneGPT
☆52Updated 7 months ago
ryao / llama3.c
A fork of llama3.c used to do some R&D on inferencing
☆17Updated last month
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆27Updated 3 months ago
nexusflowai / NexusBench
Nexusflow function call, tool use, and agent benchmarks.
☆19Updated last month
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆80Updated 2 weeks ago
gau-nernst / quantized-training
Explore training for quantized models
☆13Updated 3 weeks ago
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆52Updated 9 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆113Updated last month