TabbyML / registry-tabby
☆31Updated 3 months ago
Alternatives and similar repositories for registry-tabby
Users that are interested in registry-tabby are comparing it to the libraries listed below
Sorting:
- LLM powered development for IntelliJ☆80Updated last year
- Self-hosted LLM chatbot arena, with yourself as the only judge☆40Updated last year
- llama.cpp fork used by GPT4All☆55Updated 2 months ago
- AirLLM 70B inference with single 4GB GPU☆12Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)☆37Updated last week
- Refact AI: Open-source AI Code assistant with autocompletion, chat, refactoring and more for IntelliJ JetBrains IDEs☆60Updated this week
- Web tool to count LLM tokens (GPT, Claude, Llama, ...)☆31Updated last week
- Thin wrapper around GGML to make life easier☆29Updated this week
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated last year
- Use safetensors with ONNX 🤗☆57Updated 2 months ago
- Inference of Large Multimodal Models in C/C++. LLaVA and others☆46Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆56Updated 5 months ago
- Copilot X like features for Jetbrains IDEs using ChatGpt and GPT-4☆40Updated last year
- Rust executable for Refact Agent, it lives inside your IDE and keeps AST and VecDB indexes up to date, offers agentic tools for an AI mod…☆59Updated 2 months ago
- TensorRT-LLM server with Structured Outputs (JSON) built with Rust☆52Updated 3 weeks ago
- Testing LLM reasoning abilities with family relationship quizzes.☆62Updated 3 months ago
- Forces DeepSeek R1 models to engage in extended reasoning by intercepting early termination tokens.☆19Updated 3 months ago
- Various LLM Benchmarks☆19Updated last week
- Nexusflow function call, tool use, and agent benchmarks.☆19Updated 5 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆26Updated last year
- Running Microsoft's BitNet via Electron, React & Astro☆38Updated 3 weeks ago
- Source code for Intel's Polite Guard NLP project☆33Updated this week
- A minimalistic C++ Jinja templating engine for LLM chat templates☆138Updated last week
- An endpoint server for efficiently serving quantized open-source LLMs for code.☆55Updated last year
- CI for ggml and related projects☆29Updated this week
- fast state-of-the-art speech models and a runtime that runs anywhere 💥☆55Updated 3 months ago
- Rust standalone inference of Namo-500M series models. Extremly tiny, runing VLM on CPU.☆24Updated 2 months ago
- A proxy that hosts multiple single-model runners such as LLama.cpp and vLLM☆12Updated last month
- cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…☆40Updated this week