philpax / ggmlLinks
Tensor library for machine learning
☆21Updated last year
Alternatives and similar repositories for ggml
Users that are interested in ggml are comparing it to the libraries listed below
Sorting:
- Port of Facebook's LLaMA model in C/C++☆22Updated last year
- Inference Llama/Llama2/Llama3 Modes in NumPy☆21Updated last year
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆18Updated last year
- First token cutoff sampling inference example☆30Updated last year
- Experiments with BitNet inference on CPU☆54Updated last year
- A super simple web interface to perform blind tests on LLM outputs.☆28Updated last year
- GGUF parser in Python☆28Updated 10 months ago
- ☆12Updated 9 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 7 months ago
- Public reports detailing responses to sets of prompts by Large Language Models.☆30Updated 5 months ago
- Train your own small bitnet model☆72Updated 8 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆49Updated 4 months ago
- Inference Llama 2 in one file of pure C++☆83Updated last year
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated last year
- 🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform☆38Updated last year
- ☆48Updated 11 months ago
- inference code for mixtral-8x7b-32kseqlen☆100Updated last year
- A fork of llama3.c used to do some R&D on inferencing☆22Updated 6 months ago
- CI for ggml and related projects☆29Updated this week
- Simple high-throughput inference library☆119Updated last month
- Scripts to create your own moe models using mlx☆90Updated last year
- Adapted version of llama3.np (NumPy) to a CuPy implementation for the Llama 3 model.☆35Updated last year
- GraphRag vs Embeddings☆14Updated 11 months ago
- ☆39Updated 2 years ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Updated this week
- GGML implementation of BERT model with Python bindings and quantization.☆55Updated last year
- Port of Facebook's LLaMA model in C/C++☆22Updated 10 months ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆71Updated 4 months ago
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year