inferless / triton-co-pilotLinks
Generate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments
β20Updated last year
Alternatives and similar repositories for triton-co-pilot
Users that are interested in triton-co-pilot are comparing it to the libraries listed below
Sorting:
- β12Updated last year
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β138Updated last year
- Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K β¦β83Updated 9 months ago
- TitanML Takeoff Server is an optimization, compression and deployment platform that makes state of the art machine learning models accessβ¦β114Updated last year
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasβ¦β204Updated this week
- Accelerating your LLM training to full speed! Made with β€οΈ by ServiceNow Researchβ251Updated this week
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedbackβ107Updated 7 months ago
- β133Updated last year
- β78Updated 8 months ago
- experiments with inference on llamaβ104Updated last year
- Fine-tune an LLM to perform batch inference and online serving.β112Updated 4 months ago
- Curated list of awesome material on optimization techniques to make artificial intelligence faster and more efficient πβ119Updated 2 years ago
- Your buddy in the (L)LM space.β64Updated last year
- NLP with Rust for Python π¦πβ65Updated 4 months ago
- The official evaluation suite and dynamic data release for MixEval.β11Updated last year
- A collection of all available inference solutions for the LLMsβ91Updated 7 months ago
- Self-host LLMs with vLLM and BentoMLβ150Updated last week
- IBM development fork of https://github.com/huggingface/text-generation-inferenceβ61Updated 3 weeks ago
- Formatron empowers everyone to control the format of language models' output with minimal overhead.β225Updated 4 months ago
- Machine Learning Serving focused on GenAI with simplicity as the top priority.β59Updated 3 months ago
- The backend behind the LLM-Perf Leaderboardβ10Updated last year
- Official implementation of MetaTree: Learning a Decision Tree Algorithm with Transformersβ114Updated last year
- Train, tune, and infer Bamba modelβ133Updated 4 months ago
- Code for NeurIPS LLM Efficiency Challengeβ59Updated last year
- PyTorch implementation of models from the Zamba2 series.β185Updated 8 months ago
- Iterate fast on your RAG pipelinesβ23Updated 3 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training dataβ31Updated last year
- β48Updated last year
- β12Updated 8 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing Systemβ142Updated last year