inferless / triton-co-pilotLinks

Generate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments

☆20

Alternatives and similar repositories for triton-co-pilot

Users that are interested in triton-co-pilot are comparing it to the libraries listed below

Sorting:

huggingface / candle-cublaslt
☆12Updated last year
premAI-io / benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆138Updated last year
rragundez / chunkdot
Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K …
☆83Updated 9 months ago
titanml / takeoff-community
TitanML Takeoff Server is an optimization, compression and deployment platform that makes state of the art machine learning models access…
☆114Updated last year
beowolx / rensa
High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datas…
☆204Updated this week
ServiceNow / Fast-LLM
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
☆251Updated this week
Oxen-AI / GRPO-With-Cargo-Feedback
This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedback
☆107Updated 7 months ago
LaurentMazare / mamba.rs
☆133Updated last year
SalesforceAIResearch / SFR-RAG
☆78Updated 8 months ago
hamelsmu / llama-inference
experiments with inference on llama
☆104Updated last year
anyscale / e2e-llm-workflows
Fine-tune an LLM to perform batch inference and online serving.
☆112Updated 4 months ago
nebuly-ai / exploring-AI-optimization
Curated list of awesome material on optimization techniques to make artificial intelligence faster and more efficient 🚀
☆119Updated 2 years ago
mozilla-ai / lm-buddy
Your buddy in the (L)LM space.
☆64Updated last year
raphaelsty / LeNLP
NLP with Rust for Python 🦀🐍
☆65Updated 4 months ago
philschmid / MixEval
The official evaluation suite and dynamic data release for MixEval.
☆11Updated last year
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆91Updated 7 months ago
bentoml / BentoVLLM
Self-host LLMs with vLLM and BentoML
☆150Updated last week
IBM / text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
☆61Updated 3 weeks ago
Dan-wanna-M / formatron
Formatron empowers everyone to control the format of language models' output with minimal overhead.
☆225Updated 4 months ago
aniketmaurya / fastserve-ai
Machine Learning Serving focused on GenAI with simplicity as the top priority.
☆59Updated 3 months ago
IlyasMoutawwakil / llm-perf-backend
The backend behind the LLM-Perf Leaderboard
☆10Updated last year
EvanZhuang / MetaTree
Official implementation of MetaTree: Learning a Decision Tree Algorithm with Transformers
☆114Updated last year
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆133Updated 4 months ago
Upaya07 / NeurIPS-llm-efficiency-challenge
Code for NeurIPS LLM Efficiency Challenge
☆59Updated last year
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 8 months ago
ekshaks / ragpipe
Iterate fast on your RAG pipelines
☆23Updated 3 months ago
Alignment-Lab-AI / datagen
a pipeline for using api calls to agnostically convert unstructured data into structured training data
☆31Updated last year
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated last year
michaelfeil / candle-flash-attn-v3
☆12Updated 8 months ago
withmartian / routerbench
The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System
☆142Updated last year