nomic-ai / kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
☆41Updated last month
Related projects ⓘ
Alternatives and complementary repositories for kompute
- GGML implementation of BERT model with Python bindings and quantization.☆51Updated 9 months ago
- GPT2 implementation in C++ using Ort☆25Updated 3 years ago
- Port of Suno AI's Bark in C/C++ for fast inference☆54Updated 7 months ago
- Course Project for COMP4471 on RWKV☆16Updated 9 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆93Updated this week
- asynchronous/distributed speculative evaluation for llama3☆37Updated 3 months ago
- Inference of Mamba models in pure C☆178Updated 8 months ago
- tinygrad port of the RWKV large language model.☆43Updated 5 months ago
- ggml implementation of embedding models including SentenceTransformer and BGE☆52Updated 11 months ago
- Stable Diffusion in pure C/C++☆60Updated last year
- Port of Microsoft's BioGPT in C/C++ using ggml☆87Updated 9 months ago
- ☆43Updated 4 months ago
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆37Updated 4 months ago
- The Next Generation Multi-Modality Superintelligence☆70Updated 2 months ago
- Python bindings for ggml☆132Updated 2 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆89Updated this week
- Testing LLM reasoning abilities with family relationship quizzes.☆42Updated this week
- Train your own small bitnet model☆56Updated last month
- RWKV in nanoGPT style☆177Updated 5 months ago
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub☆155Updated last year
- ☆40Updated last year
- Inference Llama/Llama2 Modes in NumPy☆20Updated 11 months ago
- An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast☆137Updated 2 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 6 months ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆66Updated last year
- RWKV-7: Surpassing GPT☆45Updated this week
- Experiments with BitNet inference on CPU☆50Updated 7 months ago
- new optimizer☆19Updated 3 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆22Updated this week
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…☆51Updated 3 months ago