inferless / triton-co-pilot
Generate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments
☆18Updated 6 months ago
Alternatives and similar repositories for triton-co-pilot:
Users that are interested in triton-co-pilot are comparing it to the libraries listed below
- The official evaluation suite and dynamic data release for MixEval.☆10Updated 4 months ago
- ☆12Updated 2 weeks ago
- Creating Generative AI Apps which work☆16Updated 6 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆24Updated 2 months ago
- ☆18Updated 3 months ago
- NLP with Rust for Python 🦀🐍☆60Updated 7 months ago
- Self-host LLMs with vLLM and BentoML☆79Updated 2 weeks ago
- Training hybrid models for dummies.☆18Updated 2 weeks ago
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆37Updated 9 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆19Updated last month
- Repository containing awesome resources regarding Hugging Face tooling.☆46Updated last year
- ☆27Updated 2 months ago
- Supercharge huggingface transformers with model parallelism.☆76Updated 3 months ago
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆41Updated 5 months ago
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆34Updated last year
- GraphRag vs Embeddings☆13Updated 6 months ago
- End-to-End LLM Guide☆99Updated 6 months ago
- Lightweight Llama 3 8B Inference Engine in CUDA C☆43Updated last week
- A high-performance constrained decoding engine based on context free grammar in Rust☆44Updated 3 weeks ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆137Updated 6 months ago
- Vector Database with support for late interaction and token level embeddings.☆51Updated 4 months ago
- ☆39Updated last month
- Benchmark study on LanceDB, an embedded vector DB, for full-text search and vector search☆22Updated last year
- Runner in charge of collecting metrics from LLM inference endpoints for the Unify Hub☆17Updated 11 months ago
- One Line To Build Zero-Data Classifiers in Minutes☆33Updated 4 months ago
- A high throughput, end-to-end RL library for infinite horizon tasks.☆18Updated 8 months ago
- Wonderful Matrices to Build Small Language Models☆41Updated last week
- First token cutoff sampling inference example☆29Updated last year
- ☆10Updated 7 months ago