kozistr / triton-grpc-proxy-rsLinks
Proxy server for triton gRPC server that inferences embedding model in Rust
☆21Updated last year
Alternatives and similar repositories for triton-grpc-proxy-rs
Users that are interested in triton-grpc-proxy-rs are comparing it to the libraries listed below
Sorting:
- utilities for loading and running text embeddings with onnx☆44Updated last month
- Tiny inference-only implementation of LLaMA☆93Updated last year
- A high performance batching router optimises max throughput for text inference workload☆16Updated 2 years ago
- The DPAB-α Benchmark☆31Updated 8 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆45Updated last year
- ☆23Updated 7 months ago
- inference code for mixtral-8x7b-32kseqlen☆101Updated last year
- Unofficial python bindings for the rust llm library. 🐍❤️🦀☆76Updated 2 years ago
- Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables☆21Updated 4 months ago
- Using modal.com to process FineWeb-edu data☆20Updated 5 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆138Updated last year
- Chat Markup Language conversation library☆55Updated last year
- ☆46Updated last year
- Curriculum training of instruction-following LLMs with Unsloth☆14Updated 6 months ago
- Let's create synthetic textbooks together :)☆75Updated last year
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated last year
- GPU accelerated client-side embeddings for vector search, RAG etc.☆65Updated last year
- Sparse autoencoders for Contra text embedding models☆25Updated last year
- Verbosity control for AI agents☆65Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆55Updated 7 months ago
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welc…☆23Updated last year
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆32Updated 6 months ago
- Simplex Random Feature attention, in PyTorch☆74Updated last year
- A simple GUI utility for gathering LIMA-like chat data.☆23Updated 6 months ago
- ☆35Updated 2 years ago
- Simple high-throughput inference library☆128Updated 4 months ago
- ☆51Updated last year
- A guidance compatibility layer for llama-cpp-python☆36Updated 2 years ago
- Vector Database with support for late interaction and token level embeddings.☆55Updated 3 months ago
- ☆67Updated last year