philpax / ggmlLinks
Tensor library for machine learning
☆21Updated 2 years ago
Alternatives and similar repositories for ggml
Users that are interested in ggml are comparing it to the libraries listed below
Sorting:
- GGUF parser in Python☆28Updated last year
- Inference of Mamba models in pure C☆195Updated last year
- Inference Llama/Llama2/Llama3 Modes in NumPy☆21Updated 2 years ago
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆52Updated 9 months ago
- Train your own small bitnet model☆75Updated last year
- 1.58-bit LLaMa model☆83Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆94Updated this week
- Inference Llama 2 in one file of pure C++☆86Updated 2 years ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 10 months ago
- ☆52Updated last year
- inference code for mixtral-8x7b-32kseqlen☆104Updated 2 years ago
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆34Updated 8 months ago
- Light WebUI for lm.rs☆24Updated last year
- llama.cpp to PyTorch Converter☆34Updated last year
- An innovative library for efficient LLM inference via low-bit quantization☆351Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆279Updated 2 years ago
- LLM inference in C/C++☆103Updated this week
- Implementation of mamba with rust☆88Updated last year
- Experiments with BitNet inference on CPU☆54Updated last year
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆163Updated 4 months ago
- Python bindings for ggml☆146Updated last year
- AirLLM 70B inference with single 4GB GPU☆14Updated 5 months ago
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆114Updated 4 months ago
- PB-LLM: Partially Binarized Large Language Models☆157Updated 2 years ago
- First token cutoff sampling inference example☆31Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated 2 weeks ago
- Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp☆165Updated 7 months ago
- RWKV in nanoGPT style☆195Updated last year
- Scripts to create your own moe models using mlx☆90Updated last year
- The DPAB-α Benchmark☆32Updated 11 months ago