An OpenAI API compatible LLM inference server based on ExLlamaV2.
☆25Feb 9, 2024Updated 2 years ago
Alternatives and similar repositories for exllamav2-openai-server
Users that are interested in exllamav2-openai-server are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…☆51May 19, 2025Updated 10 months ago
- Train Llama Loras Easily☆31Aug 3, 2023Updated 2 years ago
- A fast batching API to serve LLM models☆189Apr 26, 2024Updated last year
- Atomatic Drone Control is an airbornedrone obstacle avoiding and navagation program☆12Sep 24, 2015Updated 10 years ago
- Simple node proxy for llama-server that enables MCP use☆19May 10, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Easily convert HuggingFace models to GGUF-format for llama.cpp☆23Jul 27, 2024Updated last year
- A free AI text generation interface based on KoboldAI☆33Feb 27, 2024Updated 2 years ago
- Fill up the `model_list` field in your LiteLLM proxy configuration file☆10Sep 7, 2024Updated last year
- QuIP quantization☆64Mar 17, 2024Updated 2 years ago
- Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.☆33Jul 12, 2024Updated last year
- Multi-Domain Expert Learning☆67Jan 23, 2024Updated 2 years ago
- ☆28Aug 30, 2023Updated 2 years ago
- Extend the Conditioning of Stable Diffusion to take Audio Embeddings Instead of Text Embeddings using Wav2Vec2-BERT model☆13Sep 25, 2024Updated last year
- CoreXY conversion for the Folgertech FT-5 printer☆15Feb 20, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆27Mar 13, 2024Updated 2 years ago
- LLM Skirmish☆45Feb 3, 2026Updated 2 months ago
- A hybrid LaTeX code + rich-text editor. It seamlessly syncs a rich-text What You See Is What You Get (WYSIWYG) view with raw LaTeX source…☆32Mar 26, 2026Updated 3 weeks ago
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,175Updated this week
- A program to automate testing open source LLMs for their political compass scores☆12Nov 28, 2023Updated 2 years ago
- mlx implementations of various transformers, speedups, training☆33Dec 14, 2023Updated 2 years ago
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆102Jun 29, 2025Updated 9 months ago
- Lossless normalization of uppercase characters☆11Jul 3, 2023Updated 2 years ago
- A modified Ziggurat Algorithm for efficiently generating exponentially- and normally-distributed PseudoRandom Numbers (PRNs).☆13May 21, 2025Updated 10 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆11Jul 23, 2023Updated 2 years ago
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13May 5, 2024Updated last year
- Conversion script adapting vicuna dataset into alpaca format for use with oobabooga's trainer☆13Jun 21, 2023Updated 2 years ago
- Raw image editor with built-in film emulation.☆23Apr 9, 2026Updated last week
- ChatGPT solutions for the MLE interview☆14Dec 9, 2022Updated 3 years ago
- Finding explainable models to predict Formula 1 Qualifying Results☆13Apr 7, 2022Updated 4 years ago
- miaoshouai-assistant for webui-forge☆15Aug 15, 2024Updated last year
- Large-scale LLM inference engine☆1,695Mar 12, 2026Updated last month
- ☆17Nov 25, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Optimization solvers in pure Python: LP, MILP, SAT, constraint programming, graph and metaheuristics. No dependencies. Solvor all your op…☆27Apr 7, 2026Updated last week
- Two-way sync between Valtio proxies and Yjs CRDTs☆21Feb 12, 2026Updated 2 months ago
- ☆14Dec 16, 2022Updated 3 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆132Jun 25, 2024Updated last year
- WPG: WavePropaGator, Interactive framework for X-ray FEL optics design and simulations.☆30Dec 29, 2025Updated 3 months ago
- ultimate openpose editor with render☆36Jun 1, 2025Updated 10 months ago
- Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs☆52Jul 10, 2024Updated last year