kozistr / triton-grpc-proxy-rsLinks
Proxy server for triton gRPC server that inferences embedding model in Rust
β21Updated 11 months ago
Alternatives and similar repositories for triton-grpc-proxy-rs
Users that are interested in triton-grpc-proxy-rs are comparing it to the libraries listed below
Sorting:
- an implementation of Self-Extend, to expand the context window via grouped attentionβ119Updated last year
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β137Updated 11 months ago
- Simplex Random Feature attention, in PyTorchβ74Updated last year
- inference code for mixtral-8x7b-32kseqlenβ100Updated last year
- Tiny inference-only implementation of LLaMAβ93Updated last year
- A high performance batching router optimises max throughput for text inference workloadβ16Updated last year
- utilities for loading and running text embeddings with onnxβ44Updated 11 months ago
- β199Updated last year
- Unofficial python bindings for the rust llm library. πβ€οΈπ¦β75Updated last year
- GPU accelerated client-side embeddings for vector search, RAG etc.β66Updated last year
- Let's create synthetic textbooks together :)β75Updated last year
- Full finetuning of large language models without large memory requirementsβ94Updated last year
- Modified Stanford-Alpaca Trainer for Training Replit's Code Modelβ41Updated 2 years ago
- Chat Markup Language conversation libraryβ55Updated last year
- β47Updated last year
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.β82Updated last year
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.β32Updated 4 months ago
- β66Updated last year
- Sparse autoencoders for Contra text embedding modelsβ25Updated last year
- β‘οΈ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.β144Updated last year
- Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.β44Updated last year
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async APIβ45Updated 9 months ago
- β39Updated 2 years ago
- Just a bunch of benchmark logs for different LLMsβ119Updated 11 months ago
- TOPLOC: is a novel method for verifiable inference that enables users to verify that LLM providers are using the correct model configuratβ¦β34Updated 3 months ago
- A miniature version of Modalβ20Updated last year
- Curriculum training of instruction-following LLMs with Unslothβ14Updated 4 months ago
- Replace expensive LLM calls with finetunes automaticallyβ65Updated last year
- Modded vLLM to run pipeline parallelism over public networksβ37Updated last month
- Low-Rank adapter extraction for fine-tuned transformers modelsβ173Updated last year