kolinko / effort
An implementation of bucketMul LLM inference
☆216Updated 9 months ago
Alternatives and similar repositories for effort:
Users that are interested in effort are comparing it to the libraries listed below
- 1.58 Bit LLM on Apple Silicon using MLX☆195Updated 11 months ago
- WebGPU LLM inference tuned by hand☆149Updated last year
- Tiny inference-only implementation of LLaMA☆92Updated last year
- Mistral7B playing DOOM☆130Updated 8 months ago
- Visualize the intermediate output of Mistral 7B☆354Updated 2 months ago
- Fast parallel LLM inference for MLX☆178Updated 9 months ago
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆735Updated last week
- Stop messing around with finicky sampling parameters and just use DRµGS!☆348Updated 10 months ago
- Heirarchical Navigable Small Worlds☆71Updated this week
- Visualizing the internal board state of a GPT trained on chess PGN strings, and performing interventions on its internal board state and …☆204Updated 4 months ago
- ☆163Updated 10 months ago
- ☆243Updated last year
- Algebraic enhancements for GEMM & AI accelerators☆275Updated last month
- Live-bending a foundation model’s output at neural network level.☆188Updated this week
- run paligemma in real time☆131Updated 10 months ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆131Updated this week
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- Inference of Mamba models in pure C☆187Updated last year
- Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 times faster with just a few l…☆281Updated last month
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆606Updated 2 weeks ago
- ☆158Updated 2 weeks ago
- ☆205Updated 2 months ago
- throwaway GPT inference☆140Updated 10 months ago
- Dead Simple LLM Abliteration☆210Updated last month
- A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private co…☆233Updated this week
- ☆278Updated this week
- Felafax is building AI infra for non-NVIDIA GPUs☆558Updated 2 months ago
- Applying the ideas of Deepseek R1 to computer use☆212Updated 2 months ago
- GGUF implementation in C as a library and a tools CLI program☆265Updated 3 months ago
- Run GGML models with Kubernetes.☆174Updated last year