inferless / triton-co-pilotLinks
Generate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments
β20Updated last year
Alternatives and similar repositories for triton-co-pilot
Users that are interested in triton-co-pilot are comparing it to the libraries listed below
Sorting:
- β13Updated last year
- π‘ Deploy AI models and apps to Kubernetes without developing a herniaβ32Updated last year
- A PyTorch implementation of constrained optimization and modeling techniquesβ31Updated last year
- A high throughput, end-to-end RL library for infinite horizon tasks.β20Updated last month
- Pivotal Token Searchβ109Updated last week
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.β47Updated 10 months ago
- A simple library for working with Hugging Face models.β14Updated 6 months ago
- β19Updated 6 months ago
- Unofficial implementation of https://arxiv.org/pdf/2407.14679β46Updated 10 months ago
- Agent-based implementation of RAG, incorporating AI agents into the RAG pipeline to orchestrate its components and perform additional actβ¦β13Updated 5 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training dataβ30Updated 9 months ago
- Asynchronous tasks on the cloudβ21Updated last year
- Runner in charge of collecting metrics from LLM inference endpoints for the Unify Hubβ17Updated last year
- Repository containing awesome resources regarding Hugging Face tooling.β47Updated last year
- ColBERT for live vector indexesβ28Updated 8 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and minβ¦β26Updated 8 months ago
- β62Updated 3 months ago
- The Granite Guardian models are designed to detect risks in prompts and responses.β91Updated 3 weeks ago
- The official evaluation suite and dynamic data release for MixEval.β11Updated 9 months ago
- LLM-Powered Analyses of your GitHub Community usingΒ EvaDBβ24Updated last year
- Supercharge huggingface transformers with model parallelism.β77Updated 9 months ago
- A tool for an analysis of LLM generations.β40Updated last month
- A collection of all available inference solutions for the LLMsβ91Updated 4 months ago
- β20Updated 9 months ago
- LLM Compression Benchmarkβ22Updated last week
- TensorRT-LLM server with Structured Outputs (JSON) built with Rustβ55Updated 2 months ago
- A high-performance constrained decoding engine based on context free grammar in Rustβ54Updated last month
- π Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.β47Updated this week
- Run Structured LLM Inference with Easy Parallelismβ16Updated 5 months ago
- Train, tune, and infer Bamba modelβ130Updated last month