anthonix / llm.c

LLM training in simple, raw C/HIP for AMD GPUs

☆37

Related projects ⓘ

Alternatives and complementary repositories for llm.c

exo-explore / mlx-bitnet
1.58 Bit LLM on Apple Silicon using MLX
☆134Updated 5 months ago
rejunity / tiny-asic-1_58bit-matrix-mul
Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit
☆110Updated 6 months ago
deepsilicon / Sila
☆84Updated last month
rafacelente / bllama
1.58-bit LLaMa model
☆79Updated 7 months ago
apple / ml-recurrent-drafter
☆96Updated last month
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆86Updated this week
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆261Updated last year
Infini-AI-Lab / Sequoia
scalable and robust tree-based speculative decoding algorithm
☆313Updated 2 months ago
kayvr / token-hawk
WebGPU LLM inference tuned by hand
☆146Updated last year
dorjeduck / llm.mojo
port of Andrjey Karpathy's llm.c to Mojo
☆321Updated 3 weeks ago
willccbb / mlx_parallm
Fast parallel LLM inference for MLX
☆146Updated 4 months ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆88Updated this week
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆171Updated 3 weeks ago
Cornell-RelaxML / qtip
☆60Updated last week
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆104Updated last month
kolinko / effort
An implementation of bucketMul LLM inference
☆214Updated 4 months ago
mag- / gpu_benchmark
Gpu benchmark
☆43Updated last month
kroggen / mamba.c
Inference of Mamba models in pure C
☆177Updated 8 months ago
cognitivecomputations / grokadamw
☆116Updated 2 months ago
xjdr-alt / llmri
look how they massacred my boy
☆53Updated 3 weeks ago
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 3 weeks ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆158Updated 2 months ago
groq / groqflow
GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…
☆99Updated last week
cognitivecomputations / laserRMT
This is our own implementation of 'Layer Selective Rank Reduction'
☆231Updated 5 months ago
sumo43 / loopvlm
run paligemma in real time
☆122Updated 5 months ago
huggingface / optimum-amd
AMD related optimizations for transformer models
☆57Updated this week
cognitivecomputations / kraken
☆64Updated 5 months ago
bigcode-project / selfcodealign
[NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation
☆259Updated last week
antirez / gguf-tools
GGUF implementation in C as a library and a tools CLI program
☆242Updated 4 months ago