inferless / triton-co-pilotLinks
Generate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments
☆20Updated 11 months ago
Alternatives and similar repositories for triton-co-pilot
Users that are interested in triton-co-pilot are comparing it to the libraries listed below
Sorting:
- ☆13Updated last year
- The official evaluation suite and dynamic data release for MixEval.☆11Updated 8 months ago
- First token cutoff sampling inference example☆30Updated last year
- ☆18Updated 4 months ago
- NLP with Rust for Python 🦀🐍☆62Updated 3 weeks ago
- A text-to-SQL prototype on the northwind sqlite dataset☆12Updated 8 months ago
- Benchmark study on LanceDB, an embedded vector DB, for full-text search and vector search☆26Updated last year
- GraphRag vs Embeddings☆13Updated 10 months ago
- ☆19Updated 7 months ago
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆37Updated last year
- LLM Workshop 2024☆15Updated 8 months ago
- A Python framework for building AI agent systems with robust task management in the form of a graph execution engine, inference capabilit…☆25Updated last week
- A Hands-on Practical Guide to LlamaIndex☆33Updated 7 months ago
- Building large language foundational model☆9Updated 3 months ago
- Repository containing awesome resources regarding Hugging Face tooling.☆47Updated last year
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.☆30Updated this week
- ☆11Updated last year
- Asynchronous tasks on the cloud☆21Updated last year
- ☆19Updated 9 months ago
- ☆20Updated 2 months ago
- Self-host LLMs with vLLM and BentoML☆116Updated this week
- 📡 Deploy AI models and apps to Kubernetes without developing a hernia☆32Updated last year
- Evolutionary Search for expert-level performance on any task with environmental feedback☆14Updated last year
- LLM Compression Benchmark☆21Updated last month
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆74Updated last week
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 6 months ago
- In this repository we put the code to split a document in a consistent way based on the concept of "idea"☆13Updated 6 months ago
- Lightweight Llama 3 8B Inference Engine in CUDA C☆47Updated 2 months ago
- A collection of reproducible inference engine benchmarks☆31Updated last month
- Tree-based indexes for neural-search☆32Updated last year