ngxson / ggml-easyLinks
Thin wrapper around GGML to make life easier
☆36Updated 3 weeks ago
Alternatives and similar repositories for ggml-easy
Users that are interested in ggml-easy are comparing it to the libraries listed below
Sorting:
- Experiments with BitNet inference on CPU☆54Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆55Updated last year
- Course Project for COMP4471 on RWKV☆17Updated last year
- Profile your CoreML models directly from Python 🐍☆28Updated 9 months ago
- implementation of https://arxiv.org/pdf/2312.09299☆21Updated last year
- Rust crate for some audio utilities☆26Updated 4 months ago
- Find out why your CoreML model isn't running on the Neural Engine!☆25Updated last year
- ☆22Updated last year
- Lightweight Llama 3 8B Inference Engine in CUDA C☆47Updated 3 months ago
- Run ONNX RWKV-v4 models with GPU acceleration using DirectML [Windows], or just on CPU [Windows AND Linux]; Limited to 430M model at this…☆21Updated 2 years ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆157Updated 2 months ago
- Python bindings for ggml☆142Updated 10 months ago
- A simple library for working with Hugging Face models.☆14Updated 6 months ago
- Port of Facebook's LLaMA model in C/C++☆22Updated last year
- Lightweight package that tracks and summarizes code changes using LLMs (Large Language Models)☆34Updated 4 months ago
- Use safetensors with ONNX 🤗☆67Updated 2 weeks ago
- mlx image models for Apple Silicon machines☆82Updated 3 months ago
- Simple high-throughput inference library☆120Updated 2 months ago
- Implementation of nougat that focuses on processing pdf locally.☆81Updated 6 months ago
- cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…☆42Updated last week
- llama.cpp gguf file parser for javascript☆43Updated 7 months ago
- Browse, search, and visualize ONNX models.☆32Updated 2 months ago
- AirLLM 70B inference with single 4GB GPU☆14Updated 2 weeks ago
- Port of Suno AI's Bark in C/C++ for fast inference☆52Updated last year
- asynchronous/distributed speculative evaluation for llama3☆39Updated 11 months ago
- ☆49Updated this week
- Video+code lecture on building nanoGPT from scratch☆69Updated last year
- This repository shows how to use Q8 kernels with `diffusers` to optimize inference of LTX-Video on ADA GPUs.☆21Updated 6 months ago
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 6 months ago
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆20Updated 9 months ago