wangcx18 / llm-vscode-inference-serverLinks

An endpoint server for efficiently serving quantized open-source LLMs for code.

☆56

Alternatives and similar repositories for llm-vscode-inference-server

Users that are interested in llm-vscode-inference-server are comparing it to the libraries listed below

Sorting:

LucienShui / huggingface-vscode-endpoint-server
starcoder server for huggingface-vscdoe custom endpoint
☆172Updated last year
TheBlokeAI / AIScripts
Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub
☆162Updated last year
Preemo-Inc / text-generation-inference
☆199Updated last year
mistralai / vllm-release
A high-throughput and memory-efficient inference and serving engine for LLMs
☆53Updated last year
mzbac / wizardCoder-vsc
Visual Studio Code extension for WizardCoder
☆149Updated 2 years ago
nuance1979 / llama-server
LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.
☆128Updated 2 years ago
epolewski / EricLLM
A fast batching API to serve LLM models
☆185Updated last year
OpenAccess-AI-Collective / ggml-webui
Deploy your GGML models to HuggingFace Spaces with Docker and gradio
☆37Updated 2 years ago
jquesnelle / transformers-openai-api
An OpenAI Completions API compatible server for NLP transformers models
☆65Updated last year
OpenAccess-AI-Collective / servereless-runpod-ggml
☆55Updated 2 years ago
QuixiAI / OpenChatML
☆157Updated last year
QuixiAI / kraken
☆66Updated last year
thomasgauthier / LoRD
Low-Rank adapter extraction for fine-tuned transformers models
☆175Updated last year
unslothai / unsloth-studio
Unsloth Studio
☆98Updated 4 months ago
Cerebras / DocChat
GPT-4 Level Conversational QA Trained In a Few Hours
☆63Updated 11 months ago
Rivridis / LLM-Assistant
Locally running LLM with internet access
☆96Updated last month
rgbkrk / chatlab
⚡️🧪 Fast LLM Tool Calling Experimentation, big and smol
☆148Updated 10 months ago
monk1337 / auto-ollama
run ollama & gguf easily with a single command
☆52Updated last year
absadiki / pyllamacpp
Python bindings for llama.cpp
☆65Updated last year
remichu-ai / gallama
☆132Updated 3 months ago
basetenlabs / truss-examples
Examples of models deployable with Truss
☆192Updated this week
nath1295 / LLMFlex
A python package for developing AI applications with local LLMs.
☆151Updated 7 months ago
cmp-nct / ggllm.cpp
Falcon LLM ggml framework with CPU and GPU support
☆246Updated last year
taprosoft / llm_finetuning
Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…
☆146Updated last year
Gryphe / MergeMonster
An unsupervised model merging algorithm for Transformers-based language models.
☆106Updated last year
QuixiAI / laserRMT
This is our own implementation of 'Layer Selective Rank Reduction'
☆239Updated last year
ChuloAI / oasis
Local LLaMAs/Models in VSCode
☆53Updated 2 years ago
ChuloAI / BrainChulo
Harnessing the Memory Power of the Camelids
☆146Updated last year
jllllll / exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆64Updated last year
c0sogi / llama-api
An OpenAI-like LLaMA inference API
☆112Updated last year