philpax / ggmlLinks
Tensor library for machine learning
☆21Updated last year
Alternatives and similar repositories for ggml
Users that are interested in ggml are comparing it to the libraries listed below
Sorting:
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆52Updated 6 months ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 6 months ago
- Inference of Mamba models in pure C☆191Updated last year
- Simple high-throughput inference library☆127Updated 3 months ago
- Train your own small bitnet model☆75Updated 10 months ago
- Estimating hardware and cloud costs of LLMs and transformer projects☆18Updated 2 months ago
- Inference Llama 2 in one file of pure C++☆83Updated 2 years ago
- First token cutoff sampling inference example☆31Updated last year
- inference code for mixtral-8x7b-32kseqlen☆101Updated last year
- Chroma's fork of hnswlib - a header-only C++/python library for fast approximate nearest neighbors☆18Updated this week
- Light WebUI for lm.rs☆24Updated 10 months ago
- ☆51Updated last year
- Experiments with BitNet inference on CPU☆54Updated last year
- AMD related optimizations for transformer models☆83Updated last week
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆277Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13Updated last year
- tinygrad port of the RWKV large language model.☆45Updated 5 months ago
- cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…☆42Updated last month
- Python bindings for ggml☆146Updated 11 months ago
- ☆19Updated 2 weeks ago
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs☆110Updated last year
- This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.☆72Updated this week
- A repository for creating, and sample code for consuming an ONNX embedding model☆33Updated 2 years ago
- Transformer GPU VRAM estimator☆66Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆64Updated last year
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated last year
- Public reports detailing responses to sets of prompts by Large Language Models.☆31Updated 7 months ago
- RWKV in nanoGPT style☆192Updated last year
- Easily convert HuggingFace models to GGUF-format for llama.cpp☆22Updated last year