inferless / triton-co-pilotLinks
Generate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments
β20Updated last year
Alternatives and similar repositories for triton-co-pilot
Users that are interested in triton-co-pilot are comparing it to the libraries listed below
Sorting:
- β12Updated last year
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β139Updated last year
- A collection of all available inference solutions for the LLMsβ93Updated 9 months ago
- β48Updated last month
- Pivotal Token Searchβ132Updated last week
- IBM development fork of https://github.com/huggingface/text-generation-inferenceβ62Updated 2 months ago
- NLP with Rust for Python π¦πβ70Updated 6 months ago
- Set of scripts to finetune LLMsβ38Updated last year
- β55Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing β‘β68Updated 3 weeks ago
- TitanML Takeoff Server is an optimization, compression and deployment platform that makes state of the art machine learning models accessβ¦β114Updated last year
- The official evaluation suite and dynamic data release for MixEval.β11Updated last year
- Train, tune, and infer Bamba modelβ137Updated 6 months ago
- Vector Database with support for late interaction and token level embeddings.β54Updated 5 months ago
- β21Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β32Updated 2 months ago
- Code for NeurIPS LLM Efficiency Challengeβ59Updated last year
- β148Updated last year
- β66Updated 8 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language modelsβ103Updated 6 months ago
- Iterate fast on your RAG pipelinesβ23Updated 5 months ago
- β42Updated last year
- β136Updated last year
- β81Updated last month
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasβ¦β217Updated 2 months ago
- Repository containing awesome resources regarding Hugging Face tooling.β48Updated last year
- Your buddy in the (L)LM space.β64Updated last year
- experiments with inference on llamaβ103Updated last year
- β79Updated 2 weeks ago
- KV Cache Steering for Inducing Reasoning in Small Language Modelsβ42Updated 4 months ago