guidance-ai / llguidanceLinks
Super-fast Structured Outputs
☆330Updated last week
Alternatives and similar repositories for llguidance
Users that are interested in llguidance are comparing it to the libraries listed below
Sorting:
- Faster structured generation☆230Updated last month
- ☆363Updated this week
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datas…☆187Updated 3 weeks ago
- Formatron empowers everyone to control the format of language models' output with minimal overhead.☆217Updated last month
- TensorRT-LLM server with Structured Outputs (JSON) built with Rust☆55Updated 2 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆139Updated this week
- Simple UI for debugging correlations of text embeddings☆287Updated last month
- Comparison of Language Model Inference Engines☆219Updated 6 months ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆137Updated 2 months ago
- Fast parallel LLM inference for MLX☆198Updated last year
- ☆130Updated last year
- A high-performance constrained decoding engine based on context free grammar in Rust☆54Updated last month
- ☆128Updated 3 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆137Updated 11 months ago
- ☆187Updated 2 weeks ago
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedback☆97Updated 4 months ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆184Updated 10 months ago
- ☆154Updated 7 months ago
- ☆199Updated last year
- High-Performance Engine for Multi-Vector Search☆116Updated last month
- Inference server benchmarking tool☆79Updated 2 months ago
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆222Updated last month
- Late Interaction Models Training & Retrieval☆481Updated this week
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆242Updated last week
- Tutorial for building LLM router☆216Updated 11 months ago
- Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from R…☆452Updated this week
- ☆546Updated 10 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆314Updated 8 months ago
- Efficient vector database for hundred millions of embeddings.☆206Updated last year
- ☆214Updated 5 months ago