kozistr / triton-grpc-proxy-rsLinks
Proxy server for triton gRPC server that inferences embedding model in Rust
☆21Updated last year
Alternatives and similar repositories for triton-grpc-proxy-rs
Users that are interested in triton-grpc-proxy-rs are comparing it to the libraries listed below
Sorting:
- A high performance batching router optimises max throughput for text inference workload☆16Updated 2 years ago
- utilities for loading and running text embeddings with onnx☆44Updated 2 months ago
- inference code for mixtral-8x7b-32kseqlen☆102Updated last year
- Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables☆21Updated 5 months ago
- Unofficial python bindings for the rust llm library. 🐍❤️🦀☆76Updated 2 years ago
- Simplex Random Feature attention, in PyTorch☆73Updated 2 years ago
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆32Updated 3 weeks ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆139Updated last year
- ☆39Updated 3 years ago
- OpenAI compatible API for serving LLAMA-2 model☆218Updated 2 years ago
- Full finetuning of large language models without large memory requirements☆93Updated last month
- GPU accelerated client-side embeddings for vector search, RAG etc.☆65Updated last year
- Tiny inference-only implementation of LLaMA☆92Updated last year
- Modified Stanford-Alpaca Trainer for Training Replit's Code Model☆41Updated 2 years ago
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datas…☆210Updated 3 weeks ago
- ☆46Updated 2 years ago
- Using modal.com to process FineWeb-edu data☆20Updated 6 months ago
- Chat Markup Language conversation library☆55Updated last year
- TensorRT-LLM server with Structured Outputs (JSON) built with Rust☆60Updated 6 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated last year
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆45Updated last year
- Training Models Daily☆16Updated last year
- The DPAB-α Benchmark☆30Updated 9 months ago
- Let's create synthetic textbooks together :)☆75Updated last year
- Estimate Your LLM's Token Toll Across Various Platforms and Configurations☆37Updated 9 months ago
- Fast inference of Instruct tuned LLaMa on your personal devices.☆23Updated 2 years ago
- Sparse autoencoders for Contra text embedding models☆25Updated last year
- Vector Database with support for late interaction and token level embeddings.☆55Updated 4 months ago
- ☆135Updated last year
- ☆40Updated 2 years ago