TabbyML / registry-tabby
☆30Updated last month
Alternatives and similar repositories for registry-tabby:
Users that are interested in registry-tabby are comparing it to the libraries listed below
- ggml implementation of BERT Embedding☆25Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- Self-hosted LLM chatbot arena, with yourself as the only judge☆38Updated last year
- Trying to deconstruct RWKV in understandable terms☆14Updated last year
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated last year
- AirLLM 70B inference with single 4GB GPU☆12Updated 7 months ago
- Inference Llama/Llama2/Llama3 Modes in NumPy☆20Updated last year
- Download full or partial git-lfs repos without temporarily using 2x disk space☆31Updated last year
- 👷 Build compute kernels☆24Updated this week
- UnitEval is a benchmarking and evaluation tools for AutoDev Coder.☆12Updated last year
- An endpoint server for efficiently serving quantized open-source LLMs for code.☆54Updated last year
- Python bindings for ggml☆140Updated 7 months ago
- Local LLM inference & management server with built-in OpenAI API☆31Updated 11 months ago
- Proof of concept for running moshi/hibiki using webrtc☆18Updated last month
- Rust crate for some audio utilities☆22Updated 3 weeks ago
- Use safetensors with ONNX 🤗☆50Updated 3 weeks ago
- Port of Suno AI's Bark in C/C++ for fast inference☆53Updated 11 months ago
- CI for ggml and related projects☆25Updated this week
- Download models from the Ollama library, without Ollama☆68Updated 4 months ago
- Various LLM Benchmarks☆14Updated last week
- Inference of Mamba models in pure C☆187Updated last year
- A converter and basic tester for rwkv onnx☆42Updated last year
- RWKV (Receptance Weighted Key Value) is a RNN with Transformer-level performance☆39Updated 2 years ago
- Inference of Large Multimodal Models in C/C++. LLaVA and others☆46Updated last year
- llama.cpp to PyTorch Converter☆33Updated 11 months ago
- Extension for using alternative GitHub Copilot (StarCoder API) in VSCode☆100Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)☆20Updated last week
- BlinkDL's RWKV-v4 running in the browser☆47Updated 2 years ago
- A simple library for working with Hugging Face models.☆14Updated 3 months ago
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆91Updated 2 months ago