philpax / ggmlLinks
Tensor library for machine learning
☆21Updated 2 years ago
Alternatives and similar repositories for ggml
Users that are interested in ggml are comparing it to the libraries listed below
Sorting:
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆52Updated 9 months ago
- inference code for mixtral-8x7b-32kseqlen☆102Updated last year
- tinygrad port of the RWKV large language model.☆44Updated 8 months ago
- Inference of Mamba models in pure C☆192Updated last year
- Port of Facebook's LLaMA model in C/C++☆21Updated 2 years ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 9 months ago
- Trying to deconstruct RWKV in understandable terms☆14Updated 2 years ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆93Updated this week
- Falcon LLM ggml framework with CPU and GPU support☆247Updated last year
- ToK aka Tree of Knowledge for Large Language Models LLM. It's a novel dataset that inspires knowledge symbolic correlation in simple inpu…☆54Updated 2 years ago
- ☆52Updated last year
- Inference Llama 2 in one file of pure C++☆84Updated 2 years ago
- ☆11Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆53Updated last year
- Python bindings for ggml☆146Updated last year
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆114Updated 3 months ago
- First token cutoff sampling inference example☆31Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆277Updated 2 years ago
- 1.58-bit LLaMa model☆83Updated last year
- GGUF parser in Python☆28Updated last year
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆33Updated 8 months ago
- Public reports detailing responses to sets of prompts by Large Language Models.☆32Updated 10 months ago
- ☆26Updated 2 years ago
- Easily convert HuggingFace models to GGUF-format for llama.cpp☆23Updated last year
- Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp☆162Updated 6 months ago
- Python package for compressing floating-point PyTorch tensors☆12Updated last year
- AMD related optimizations for transformer models☆95Updated last month
- Experiments with BitNet inference on CPU☆54Updated last year
- llama.cpp to PyTorch Converter☆34Updated last year
- 1.58 Bit LLM on Apple Silicon using MLX☆225Updated last year