France-Travail / happy_vllm
A REST API for vLLM, production ready
☆17Updated this week
Alternatives and similar repositories for happy_vllm:
Users that are interested in happy_vllm are comparing it to the libraries listed below
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆34Updated last month
- ☆42Updated this week
- Code for KaLM-Embedding models☆64Updated last week
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆76Updated this week
- A library integrating embedding and reranker models from OpenAI, SentenceTransformers etc for semantic search in vector database.☆29Updated this week
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆58Updated 2 weeks ago
- Data preparation code for CrystalCoder 7B LLM☆44Updated 8 months ago
- LLM reads a paper and produce a working prototype☆47Updated 3 weeks ago
- ☆96Updated 4 months ago
- The backend behind the LLM-Perf Leaderboard☆10Updated 8 months ago
- ☆50Updated 2 weeks ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆39Updated 6 months ago
- ☆17Updated 3 weeks ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆53Updated last week
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆27Updated last year
- ☆52Updated 7 months ago
- vLLM Router☆17Updated 10 months ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆15Updated 2 months ago
- ☆51Updated 6 months ago
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.☆61Updated last week
- Train, tune, and infer Bamba model☆77Updated last week
- A pipeline for LLM knowledge distillation☆83Updated 5 months ago
- NanoGPT (124M) quality in 2.67B tokens☆26Updated this week
- Using APPL to reimplement popular algorithms for Large Language Models (LLMs) and prompts☆43Updated last week
- Using LlamaIndex, Redis, and OpenAI to chat with PDF documents. Supplementary material for blog post on Microsoft Developer Blog☆110Updated last year
- A list of language models with permissive licenses such as MIT or Apache 2.0☆24Updated 2 months ago
- vLLM adapter for a TGIS-compatible gRPC server.☆16Updated this week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆60Updated 9 months ago
- Benchmark suite for LLMs from Fireworks.ai☆64Updated last month