philpax / ggml
Tensor library for machine learning
☆21Updated last year
Alternatives and similar repositories for ggml
Users that are interested in ggml are comparing it to the libraries listed below
Sorting:
- Light WebUI for lm.rs☆23Updated 7 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated this week
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆109Updated 2 months ago
- Experiments with BitNet inference on CPU☆55Updated last year
- Training hybrid models for dummies.☆21Updated 4 months ago
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆17Updated last year
- ☆47Updated 10 months ago
- Public reports detailing responses to sets of prompts by Large Language Models.☆30Updated 4 months ago
- First token cutoff sampling inference example☆30Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆35Updated last year
- Chunk Dedupe Estimation☆14Updated 6 months ago
- The official evaluation suite and dynamic data release for MixEval.☆11Updated 7 months ago
- GRDN.AI app for garden optimization☆70Updated last year
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.☆16Updated 6 months ago
- Port of Facebook's LLaMA model in C/C++☆20Updated last year
- a lightweight, open-source blueprint for building powerful and scalable LLM chat applications☆28Updated 11 months ago
- inference code for mixtral-8x7b-32kseqlen☆100Updated last year
- Command line tool for Deep Infra cloud ML inference service☆30Updated 11 months ago
- ☆39Updated 2 years ago
- A super simple web interface to perform blind tests on LLM outputs.☆28Updated last year
- Inference Llama/Llama2/Llama3 Modes in NumPy☆20Updated last year
- ☆27Updated 8 months ago
- Trying to deconstruct RWKV in understandable terms☆14Updated 2 years ago
- AirLLM 70B inference with single 4GB GPU☆12Updated 9 months ago
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13Updated last year
- A minimal implementation of GraphRAG, designed to quickly prototype whether you're able to get good sense-making out of a large dataset w…☆28Updated 3 months ago
- Scripts to create your own moe models using mlx☆89Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆52Updated last year
- Python bindings for ggml☆140Updated 8 months ago
- Transformer GPU VRAM estimator☆61Updated last year