wangcx18 / llm-vscode-inference-server
An endpoint server for efficiently serving quantized open-source LLMs for code.
☆54Updated last year
Alternatives and similar repositories for llm-vscode-inference-server:
Users that are interested in llm-vscode-inference-server are comparing it to the libraries listed below
- Visual Studio Code extension for WizardCoder☆145Updated last year
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub☆156Updated last year
- Host the GPTQ model using AutoGPTQ as an API that is compatible with text generation UI API.☆91Updated last year
- starcoder server for huggingface-vscdoe custom endpoint☆168Updated last year
- ☆199Updated last year
- ☆38Updated last year
- Deploy your GGML models to HuggingFace Spaces with Docker and gradio☆36Updated last year
- run ollama & gguf easily with a single command☆49Updated 9 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆58Updated 5 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆44Updated 4 months ago
- An OpenAI Completions API compatible server for NLP transformers models☆63Updated last year
- This is our own implementation of 'Layer Selective Rank Reduction'☆232Updated 8 months ago
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…☆147Updated last year
- ☆123Updated last week
- GPT-2 small trained on phi-like data☆65Updated 11 months ago
- GRDN.AI app for garden optimization☆70Updated last year
- ☆55Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆64Updated 3 months ago
- ☆152Updated 7 months ago
- A guidance compatibility layer for llama-cpp-python☆34Updated last year
- Embed anything.☆29Updated 8 months ago
- ☆65Updated 8 months ago
- Very basic framework for parameterized large language model (Q)LoRA / (Q)Dora fine-tuning using mlx, mlx_lm, and OgbujiPT. Architecture …☆37Updated 2 weeks ago
- ☆74Updated last year
- Let's create synthetic textbooks together :)☆73Updated last year
- cli tool to quantize gguf, gptq, awq, hqq and exl2 models☆68Updated last month
- Experimental sampler to make LLMs more creative☆30Updated last year
- Python bindings for llama.cpp☆65Updated 11 months ago
- ☆52Updated last month
- LLM finetuning☆42Updated last year