bjj / exllamav2-openai-serverLinks
An OpenAI API compatible LLM inference server based on ExLlamaV2.
☆25Updated last year
Alternatives and similar repositories for exllamav2-openai-server
Users that are interested in exllamav2-openai-server are comparing it to the libraries listed below
Sorting:
- ☆27Updated last year
- Experimental sampler to make LLMs more creative☆31Updated last year
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆43Updated last year
- GPT-2 small trained on phi-like data☆66Updated last year
- ☆73Updated last year
- Model REVOLVER, a human in the loop model mixing system.☆33Updated last year
- entropix style sampling + GUI☆26Updated 7 months ago
- Easily convert HuggingFace models to GGUF-format for llama.cpp☆21Updated 10 months ago
- Glyphs, acting as collaboratively defined symbols linking related concepts, add a layer of multidimensional semantic richness to user-AI …☆49Updated 4 months ago
- ☆31Updated last year
- ☆53Updated last year
- ☆20Updated last year
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated last year
- Self-hosted LLM chatbot arena, with yourself as the only judge☆41Updated last year
- Train Llama Loras Easily☆31Updated last year
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆24Updated 3 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated 2 years ago
- Modified Beam Search with periodical restart☆12Updated 9 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated last year
- Local LLM inference & management server with built-in OpenAI API☆31Updated last year
- 5X faster 60% less memory QLoRA finetuning☆21Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆64Updated last year
- run ollama & gguf easily with a single command☆51Updated last year
- ☆22Updated last year
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆22Updated 7 months ago
- An Extension for oobabooga/text-generation-webui☆36Updated last year
- LLM backed Fantasy Tribe Game☆18Updated 7 months ago
- Genertaes control vectors for use with llama.cpp in GGUF format.☆25Updated 3 months ago
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Updated 7 months ago