wangcx18 / llm-vscode-inference-server
An endpoint server for efficiently serving quantized open-source LLMs for code.
☆54Updated last year
Alternatives and similar repositories for llm-vscode-inference-server:
Users that are interested in llm-vscode-inference-server are comparing it to the libraries listed below
- starcoder server for huggingface-vscdoe custom endpoint☆171Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆64Updated last year
- An OpenAI Completions API compatible server for NLP transformers models☆65Updated last year
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub☆160Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 11 months ago
- Host the GPTQ model using AutoGPTQ as an API that is compatible with text generation UI API.☆91Updated last year
- ☆199Updated last year
- GPT-2 small trained on phi-like data☆66Updated last year
- ☆73Updated last year
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…☆147Updated last year
- A guidance compatibility layer for llama-cpp-python☆34Updated last year
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated last year
- ☆66Updated 10 months ago
- Self-host LLMs with vLLM and BentoML☆105Updated last week
- 4 bits quantization of SantaCoder using GPTQ☆51Updated last year
- Extend the original llama.cpp repo to support redpajama model.☆117Updated 7 months ago
- ☆38Updated last year
- ☆55Updated last year
- Self-hosted LLM chatbot arena, with yourself as the only judge☆39Updated last year
- Local LLaMAs/Models in VSCode☆53Updated last year
- 🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform☆38Updated last year
- Deploy your GGML models to HuggingFace Spaces with Docker and gradio☆36Updated last year
- Practical and advanced guide to LLMOps. It provides a solid understanding of large language models’ general concepts, deployment techniqu…☆63Updated 8 months ago
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- Merge Transformers language models by use of gradient parameters.☆206Updated 8 months ago
- ☆57Updated 3 weeks ago
- Low-Rank adapter extraction for fine-tuned transformers models☆171Updated 11 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆45Updated 6 months ago
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.☆115Updated 11 months ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆37Updated last year