France-Travail / happy_vllmLinks

A REST API for vLLM, production ready

☆22

Alternatives and similar repositories for happy_vllm

Users that are interested in happy_vllm are comparing it to the libraries listed below

Sorting:

kyegomez / FastFF
Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"
☆16Updated 8 months ago
jina-ai / submodular-optimization
Submodular optimization for context engineering: query fan-out, text selection, passage reranking
☆33Updated this week
huggingface / huggingface-inference-toolkit
Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.
☆82Updated this week
opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆33Updated this week
Zyphra / transformers_zamba2
☆48Updated 5 months ago
mmhamdy / open-language-models
A list of language models with permissive licenses such as MIT or Apache 2.0
☆24Updated 4 months ago
tanyuqian / cappy
NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
☆43Updated last year
The-Inscrutable-X / TACQ
Official Repository for Task-Circuit Quantization
☆20Updated last month
jina-ai / llm-query-expansion
Query Expension for Better Query Embedding using LLMs
☆52Updated 4 months ago
LLM360 / crystalcoder-data-prep
Data preparation code for CrystalCoder 7B LLM
☆45Updated last year
ElleLeonne / Lightning-ReLoRA
A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.
☆33Updated last year
IlyasMoutawwakil / py-txi
A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.
☆33Updated 2 months ago
yilinjz / astchunk
ASTChunk is a Python toolkit for code chunking using Abstract Syntax Trees (ASTs), designed to create structurally sound and meaningful c…
☆30Updated 2 weeks ago
Zyphra / Zyda_processing
☆35Updated last year
deep-diver / LLM-Pref-Mark-UI
☆37Updated 2 years ago
austinsilveria / tricksy
Fast approximate inference on a single GPU with sparsity aware offloading
☆38Updated last year
facebookresearch / lss_eval
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Updated last year
LLM360 / k2-data-prep
☆20Updated last year
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆81Updated last month
ibm-granite / granite-embedding-models
☆29Updated 2 weeks ago
google-research-datasets / QAmeleon
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆34Updated last year
lilakk / BLEUBERI
Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"
☆25Updated last month
lightblue-tech / lb-reranker
☆23Updated 5 months ago
HITsz-TMG / KaLM-Embedding
Code for KaLM-Embedding models
☆85Updated 2 weeks ago
dmarx / zero-shot-intent-classifier
Minimal zero-shot intent classifier for arbitrary intent slot filling, via LLM prompting w LangChain.
☆33Updated 2 years ago
bentoml / BentoLMDeploy
Self-host LLMs with LMDeploy and BentoML
☆20Updated last week
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆55Updated 2 weeks ago
cxcscmu / RAGViz
Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]
☆85Updated 5 months ago
kyegomez / LM-Infinite
Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆41Updated 8 months ago
AlexBodner / How_Much_VRAM
☆101Updated 10 months ago