kolinko / effortLinks

An implementation of bucketMul LLM inference

☆221

Alternatives and similar repositories for effort

Users that are interested in effort are comparing it to the libraries listed below

Sorting:

umuthopeyildirim / DOOM-Mistral
Mistral7B playing DOOM
☆133Updated last year
valine / NeuralFlow
Visualize the intermediate output of Mistral 7B
☆367Updated 6 months ago
EGjoni / DRUGS
Stop messing around with finicky sampling parameters and just use DRµGS!
☆351Updated last year
willccbb / mlx_parallm
Fast parallel LLM inference for MLX
☆204Updated last year
Cerebras / gigaGPT
a small code base for training large models
☆307Updated 3 months ago
kayvr / token-hawk
WebGPU LLM inference tuned by hand
☆151Updated 2 years ago
valine / training-hot-swap
Pytorch script hot swap: Change code without unloading your LLM from VRAM
☆126Updated 3 months ago
ScalingIntelligence / tokasaurus
☆388Updated last week
exo-explore / mlx-bitnet
1.58 Bit LLM on Apple Silicon using MLX
☆217Updated last year
adamkarvonen / chess_llm_interpretability
Visualizing the internal board state of a GPT trained on chess PGN strings, and performing interventions on its internal board state and …
☆208Updated 8 months ago
okuvshynov / slowllama
Finetune llama2-70b and codellama on MacBook Air without quantization
☆448Updated last year
mlecauchois / micrograd-cuda
☆249Updated last year
lechmazur / elimination_game
A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private co…
☆284Updated 3 weeks ago
rentruewang / bocoel
Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 times faster with just a few l…
☆286Updated last month
babycommando / neuralgraffiti
Live-bending a foundation model’s output at neural network level.
☆266Updated 3 months ago
em-llm / EM-LLM-model
☆220Updated 5 months ago
PaulPauls / llama3_interpretability_sae
A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…
☆622Updated 4 months ago
sumo43 / loopvlm
run paligemma in real time
☆131Updated last year
andyk / recursive_llm
Implement recursion using English as the programming language and an LLM as the runtime.
☆239Updated 2 years ago
antirez / gguf-tools
GGUF implementation in C as a library and a tools CLI program
☆277Updated 6 months ago
neurallambda / awesome-reasoning
a curated list of data for reasoning ai
☆137Updated last year
deepsilicon / Sila
☆89Updated 10 months ago
sdan / selfextend
an implementation of Self-Extend, to expand the context window via grouped attention
☆119Updated last year
agentsea / r1-computer-use
Applying the ideas of Deepseek R1 to computer use
☆216Updated 6 months ago
danielgross / ggml-k8s
Run GGML models with Kubernetes.
☆173Updated last year
llmonpy / needle-in-a-needlestack
☆116Updated 6 months ago
intentee / paddler
Stateful load balancer custom-tailored for llama.cpp 🏓🦙
☆800Updated this week
felafax / felafax
Felafax is building AI infra for non-NVIDIA GPUs
☆566Updated 6 months ago
recmo / cria
Tiny inference-only implementation of LLaMA
☆93Updated last year
M4THYOU / TokenDagger
High-Performance Implementation of OpenAI's TikToken.
☆445Updated last month