guidance-ai / llgtrt
TensorRT-LLM server with Structured Outputs (JSON) built with Rust
☆45Updated this week
Alternatives and similar repositories for llgtrt:
Users that are interested in llgtrt are comparing it to the libraries listed below
- Super-fast Structured Outputs☆171Updated this week
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedback☆75Updated 3 weeks ago
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datas…☆80Updated last month
- A high-performance constrained decoding engine based on context free grammar in Rust☆48Updated 3 months ago
- Super-simple, fully Rust powered "memory" (doc store + semantic search) for LLM projects, semantic search, etc.☆58Updated last year
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆76Updated 3 months ago
- Guaranteed structured outputs from any language model. Eliminate 100% of schema violations and state tracking failures in your LLM applic…☆120Updated this week
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rust☆37Updated last year
- ☆66Updated 10 months ago
- A high performance batching router optimises max throughput for text inference workload☆16Updated last year
- ☆126Updated 11 months ago
- Rust implementation of Huggingface transformers pipelines using onnxruntime backend with bindings to C# and C.☆38Updated 2 years ago
- Inference of Mamba models in pure C☆187Updated last year
- Tokun to can tokens☆16Updated last month
- Implementation of mamba with rust☆85Updated last year
- A tree-based prefix cache library that allows rapid creation of looms: hierarchal branching pathways of LLM generations.☆68Updated last month
- Rust implementation of Surya☆57Updated last month
- A single-binary, GPU-accelerated LLM server (HTTP and WebSocket API) written in Rust☆79Updated last year
- Self-hosted LLM chatbot arena, with yourself as the only judge☆38Updated last year
- ☆137Updated last year
- Faster structured generation☆198Updated this week
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated 5 months ago
- GPU accelerated client-side embeddings for vector search, RAG etc.☆66Updated last year
- Library for doing RAG☆70Updated last week
- ☆38Updated last year
- ☆17Updated last week
- Editor with LLM generation tree exploration☆65Updated last month
- A simple, CUDA or CPU powered, library for creating vector embeddings using Candle and models from Hugging Face☆34Updated 10 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆45Updated 6 months ago
- Distributed Inference for mlx LLm☆87Updated 8 months ago