philpax / ggml
Tensor library for machine learning
☆20Updated last year
Related projects ⓘ
Alternatives and complementary repositories for ggml
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆89Updated this week
- ☆44Updated 4 months ago
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆100Updated 3 weeks ago
- Light WebUI for lm.rs☆22Updated last month
- Command line tool for Deep Infra cloud ML inference service☆26Updated 5 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆23Updated last week
- Port of Facebook's LLaMA model in C/C++☆20Updated last year
- ☆40Updated last year
- inference code for mixtral-8x7b-32kseqlen☆98Updated 11 months ago
- tinygrad port of the RWKV large language model.☆43Updated 5 months ago
- Transformer GPU VRAM estimator☆40Updated 7 months ago
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆41Updated last month
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆37Updated 4 months ago
- ☆26Updated last year
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.☆15Updated 3 weeks ago
- Experiments with BitNet inference on CPU☆50Updated 7 months ago
- QuIP quantization☆46Updated 8 months ago
- Inference Llama/Llama2 Modes in NumPy☆20Updated last year
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆17Updated 11 months ago
- Github repo for Peifeng's internship project☆12Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆262Updated last year
- ToK aka Tree of Knowledge for Large Language Models LLM. It's a novel dataset that inspires knowledge symbolic correlation in simple inpu…☆46Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆36Updated last year
- AirLLM 70B inference with single 4GB GPU☆12Updated 3 months ago
- Inference Llama 2 in one file of pure C++☆80Updated last year
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆112Updated last year
- AI Assistant running within your browser.☆44Updated 3 weeks ago
- Testing LLM reasoning abilities with family relationship quizzes.☆43Updated this week
- cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…☆23Updated this week