kozistr / triton-grpc-proxy-rsLinks
Proxy server for triton gRPC server that inferences embedding model in Rust
☆21Updated last year
Alternatives and similar repositories for triton-grpc-proxy-rs
Users that are interested in triton-grpc-proxy-rs are comparing it to the libraries listed below
Sorting:
- A high performance batching router optimises max throughput for text inference workload☆16Updated 2 years ago
- utilities for loading and running text embeddings with onnx☆44Updated 4 months ago
- Unofficial python bindings for the rust llm library. 🐍❤️🦀☆76Updated 2 years ago
- Chat Markup Language conversation library☆55Updated last year
- Tiny inference-only implementation of LLaMA☆92Updated last year
- ☆45Updated 2 years ago
- ☆198Updated last year
- Full finetuning of large language models without large memory requirements☆94Updated 3 months ago
- Simplex Random Feature attention, in PyTorch☆75Updated 2 years ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆46Updated last year
- inference code for mixtral-8x7b-32kseqlen☆104Updated 2 years ago
- ☆35Updated 2 years ago
- Fast inference of Instruct tuned LLaMa on your personal devices.☆23Updated 2 years ago
- GPU accelerated client-side embeddings for vector search, RAG etc.☆65Updated 2 years ago
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆32Updated 2 months ago
- Vector Database with support for late interaction and token level embeddings.☆54Updated 6 months ago
- Using modal.com to process FineWeb-edu data☆20Updated 8 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆139Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated last year
- ☆39Updated 3 years ago
- Let's create synthetic textbooks together :)☆75Updated last year
- Curriculum training of instruction-following LLMs with Unsloth☆14Updated last week
- Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables☆21Updated 7 months ago
- Transformer GPU VRAM estimator☆67Updated last year
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆70Updated 2 years ago
- ☆24Updated 10 months ago
- a lightweight, open-source blueprint for building powerful and scalable LLM chat applications☆28Updated last year
- A guidance compatibility layer for llama-cpp-python☆36Updated 2 years ago
- Estimate Your LLM's Token Toll Across Various Platforms and Configurations☆38Updated last month