kozistr / triton-grpc-proxy-rsLinks

Proxy server for triton gRPC server that inferences embedding model in Rust

☆21

Alternatives and similar repositories for triton-grpc-proxy-rs

Users that are interested in triton-grpc-proxy-rs are comparing it to the libraries listed below

Sorting:

taylorai / onnx_embedding_models
utilities for loading and running text embeddings with onnx
☆44Updated last month
recmo / cria
Tiny inference-only implementation of LLaMA
☆93Updated last year
ialacol / text-inference-batcher
A high performance batching router optimises max throughput for text inference workload
☆16Updated 2 years ago
firstbatchxyz / function-calling-eval
The DPAB-α Benchmark
☆31Updated 8 months ago
michaelfeil / embed
A stable, fast and easy-to-use inference library with a focus on a sync-to-async API
☆45Updated last year
lightblue-tech / lb-reranker
☆23Updated 7 months ago
vikhyat / mixtral-inference
inference code for mixtral-8x7b-32kseqlen
☆101Updated last year
LLukas22 / llm-rs-python
Unofficial python bindings for the rust llm library. 🐍❤️🦀
☆76Updated 2 years ago
ashvardanian / jaccard-index
Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables
☆21Updated 4 months ago
enjalot / latent-data-modal
Using modal.com to process FineWeb-edu data
☆20Updated 5 months ago
premAI-io / benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆138Updated last year
deployradiant / pychatml
Chat Markup Language conversation library
☆55Updated last year
teknium1 / transformers-gptq-quant
☆46Updated last year
oKatanaaa / kolibrify
Curriculum training of instruction-following LLMs with Unsloth
☆14Updated 6 months ago
thooton / muse
Let's create synthetic textbooks together :)
☆75Updated last year
sdan / selfextend
an implementation of Self-Extend, to expand the context window via grouped attention
☆118Updated last year
FL33TW00D / embd
GPU accelerated client-side embeddings for vector search, RAG etc.
☆65Updated last year
thesephist / spectre
Sparse autoencoders for Contra text embedding models
☆25Updated last year
BBischof / yapping
Verbosity control for AI agents
☆65Updated last year
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆55Updated 7 months ago
PrithivirajDamodaran / blitz-embed
C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welc…
☆23Updated last year
JD-P / RetroInstruct
Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.
☆32Updated 6 months ago
notarussianteenager / srf-attention
Simplex Random Feature attention, in PyTorch
☆74Updated last year
oKatanaaa / lima-gui
A simple GUI utility for gathering LIMA-like chat data.
☆23Updated 6 months ago
abetlen / program-constrained-language-model-sampling
☆35Updated 2 years ago
facebookresearch / fastgen
Simple high-throughput inference library
☆128Updated 4 months ago
nyunAI / PruneGPT
☆51Updated last year
nicholasyager / llama-cpp-guidance
A guidance compatibility layer for llama-cpp-python
☆36Updated 2 years ago
DeployQL / LintDB
Vector Database with support for late interaction and token level embeddings.
☆55Updated 3 months ago
QuixiAI / kraken
☆67Updated last year