philpax / ggml
Tensor library for machine learning
☆20Updated last year
Related projects ⓘ
Alternatives and complementary repositories for ggml
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆36Updated last year
- ☆40Updated last year
- tinygrad port of the RWKV large language model.☆43Updated 4 months ago
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆222Updated last month
- Experiments with BitNet inference on CPU☆50Updated 7 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆86Updated this week
- 1.58 Bit LLM on Apple Silicon using MLX☆134Updated 6 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆51Updated 8 months ago
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆17Updated 10 months ago
- Github repo for Peifeng's internship project☆12Updated last year
- QuIP quantization☆46Updated 7 months ago
- Trying to deconstruct RWKV in understandable terms☆14Updated last year
- First token cutoff sampling inference example☆28Updated 9 months ago
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆99Updated last week
- Light WebUI for lm.rs☆21Updated 3 weeks ago
- ☆26Updated last year
- A super simple web interface to perform blind tests on LLM outputs.☆26Updated 8 months ago
- inference code for mixtral-8x7b-32kseqlen☆98Updated 10 months ago
- GRDN.AI app for garden optimization☆69Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆50Updated 10 months ago
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated last year
- Scripts to create your own moe models using mlx☆86Updated 8 months ago
- ☆114Updated 6 months ago
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆12Updated 6 months ago
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆110Updated 6 months ago
- ☆43Updated 3 months ago
- Public reports detailing responses to sets of prompts by Large Language Models.☆25Updated last year
- GPU Power and Performance Manager☆46Updated 3 weeks ago