kozistr / triton-grpc-proxy-rs
Proxy server for triton gRPC server that inferences embedding model in Rust
☆17Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for triton-grpc-proxy-rs
- A high performance batching router optimises max throughput for text inference workload☆16Updated last year
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆19Updated 7 months ago
- Simple examples using Argilla tools to build AI☆40Updated this week
- Embedding models from Jina AI☆56Updated 10 months ago
- utilities for loading and running text embeddings with onnx☆39Updated 3 months ago
- A Python library to orchestrate LLMs in a neural network-inspired structure☆41Updated last month
- NLP with Rust for Python 🦀🐍☆59Updated 5 months ago
- Using modal.com to process FineWeb-edu data☆19Updated 2 months ago
- ☆36Updated 2 years ago
- Chat Markup Language conversation library☆54Updated 10 months ago
- ☆18Updated this week
- Training code for Sparse Autoencoders on Embedding models☆33Updated 3 weeks ago
- Prototyping a question and answer bot over PDFs☆38Updated last year
- A library for incremental loading of large PyTorch checkpoints☆56Updated last year
- Public reports detailing responses to sets of prompts by Large Language Models.☆26Updated last year
- Routing on Random Forest (RoRF)☆84Updated last month
- Build Agentic workflows with function calling☆20Updated this week
- FalkorDB-Browser is a visualization UI for FalkorDB.☆20Updated 2 weeks ago
- Efficiently computing & storing token n-grams from large corpora☆15Updated last month
- A miniature version of Modal☆18Updated 5 months ago
- Voyage AI Official Python Library☆41Updated 2 weeks ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆23Updated last week
- Framework for Self-Organizing Python Agents☆30Updated 9 months ago
- Because it's there.☆14Updated 2 months ago
- GPU accelerated client-side embeddings for vector search, RAG etc.☆63Updated 11 months ago
- Fast inference of Instruct tuned LLaMa on your personal devices.☆22Updated last year
- RAG on codebases using treesitter and LanceDB☆31Updated this week
- Vector Database with support for late interaction and token level embeddings.☆54Updated last month