philpax / ggml
Tensor library for machine learning
☆21Updated last year
Alternatives and similar repositories for ggml:
Users that are interested in ggml are comparing it to the libraries listed below
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated last week
- Command line tool for Deep Infra cloud ML inference service☆26Updated 7 months ago
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆43Updated 3 months ago
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated last year
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆17Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆51Updated last year
- Inference Llama/Llama2/Llama3 Modes in NumPy☆20Updated last year
- Experiments with BitNet inference on CPU☆52Updated 9 months ago
- GraphRag vs Embeddings☆13Updated 6 months ago
- Public reports detailing responses to sets of prompts by Large Language Models.☆28Updated 2 weeks ago
- Some random tools for working with the GGUF file format☆24Updated last year
- A quick and optimized solution to manage llama based gguf quantized models, download gguf files, retreive messege formatting, add more mo…☆12Updated last year
- Light WebUI for lm.rs☆23Updated 3 months ago
- ☆44Updated 6 months ago
- A collection of all available inference solutions for the LLMs☆74Updated 4 months ago
- Github repo for Peifeng's internship project☆13Updated last year
- AirLLM 70B inference with single 4GB GPU☆12Updated 5 months ago
- ☆82Updated last month
- Plug n Play GBNF Compiler for llama.cpp☆23Updated last year
- GPU Power and Performance Manager☆51Updated 3 months ago
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆42Updated 6 months ago
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆106Updated last month
- Google TPU optimizations for transformers models☆87Updated this week
- Explore training for quantized models☆12Updated 2 weeks ago
- Lightweight Llama 3 8B Inference Engine in CUDA C☆42Updated 2 weeks ago
- Trying to deconstruct RWKV in understandable terms☆14Updated last year
- minimal C implementation of speculative decoding based on llama2.c☆18Updated 6 months ago
- Port of Facebook's LLaMA model in C/C++☆20Updated last year
- 1.58-bit LLaMa model☆80Updated 9 months ago