inferless / triton-co-pilot
Generate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments
☆13Updated 2 months ago
Related projects: ⓘ
- A high throughput, end-to-end RL library for infinite horizon tasks.☆16Updated 3 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆23Updated last week
- Access llamafile localhost models via LLM☆14Updated 4 months ago
- LLama implementations benchmarking framework☆10Updated 10 months ago
- Evolutionary Search for expert-level performance on any task with environmental feedback☆14Updated 7 months ago
- A visual tool to interpret and understand PyTorch machine learning models☆14Updated 7 months ago
- Programmable, automated machine learning - proof of concept☆13Updated 3 months ago
- Repository containing awesome resources regarding Hugging Face tooling.☆43Updated 8 months ago
- Use Grounding DINO, Segment Anything, and CLIP to label objects in images.☆22Updated 8 months ago
- First token cutoff sampling inference example☆28Updated 8 months ago
- NLP with Rust for Python 🦀🐍☆57Updated 3 months ago
- ☆10Updated 2 weeks ago
- a simple create-llama template using llama-index v0.10 and integrated with Ollama☆9Updated 4 months ago
- A Python library for real-time PostgreSQL event-driven cache invalidation.☆17Updated 5 months ago
- LLM-Powered Analyses of your GitHub Community using EvaDB☆22Updated 11 months ago
- tsellm: LLMs in SQLite and DuckDB☆21Updated last month
- Python module that creates a context map for AI code generation☆13Updated last month
- This project involves using llamaindex Multi Agents concierge system and Qdrant vector database to customize the RAG application with use…☆11Updated last month
- Run embedding models using ONNX☆23Updated 7 months ago
- AI aware proxy☆16Updated this week
- LLM code editor for backend services☆10Updated 2 months ago
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…☆11Updated 7 months ago
- Efficient BM25 with DuckDB 🦆☆12Updated last week
- ☆16Updated 2 weeks ago
- QLLM: A powerful CLI for seamless interaction with multiple Large Language Models. Simplify AI workflows, streamline development, and unl…☆23Updated this week
- Define and implement any functions on the fly with LLMs☆11Updated 4 months ago
- Asynchronous tasks on the cloud☆21Updated 10 months ago
- Self-host LLMs with vLLM and BentoML☆62Updated this week
- 360M model running in the browser on WebGPU☆18Updated last month
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆26Updated last year