philpax / ggml
Tensor library for machine learning
☆21Updated last year
Alternatives and similar repositories for ggml:
Users that are interested in ggml are comparing it to the libraries listed below
- First token cutoff sampling inference example☆30Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated this week
- Port of Facebook's LLaMA model in C/C++☆20Updated last year
- Inference Llama/Llama2/Llama3 Modes in NumPy☆20Updated last year
- LLM inference in C/C++☆71Updated this week
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆35Updated last year
- ☆46Updated 9 months ago
- Experiments with BitNet inference on CPU☆53Updated last year
- ☆12Updated 7 months ago
- The official evaluation suite and dynamic data release for MixEval.☆11Updated 7 months ago
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆46Updated 2 months ago
- inference code for mixtral-8x7b-32kseqlen☆99Updated last year
- Train your own small bitnet model☆67Updated 6 months ago
- Public reports detailing responses to sets of prompts by Large Language Models.☆30Updated 3 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 5 months ago
- Inference Llama 2 in one file of pure C++☆83Updated last year
- Light WebUI for lm.rs☆23Updated 6 months ago
- Command line tool for Deep Infra cloud ML inference service☆30Updated 10 months ago
- Chunk Dedupe Estimation☆14Updated 5 months ago
- 1.58-bit LLaMa model☆81Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 11 months ago
- The DPAB-α Benchmark☆20Updated 3 months ago
- Inference code for LLaMA models☆42Updated 2 years ago
- Lossless normalization of uppercase characters☆11Updated last year
- Data preparation code for CrystalCoder 7B LLM☆44Updated 11 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- Training hybrid models for dummies.☆20Updated 3 months ago
- Extract structured data from local or remote LLM models☆41Updated 10 months ago
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆17Updated last year
- GraphRag vs Embeddings☆13Updated 9 months ago