guidance-ai / llguidance
Super-fast Structured Outputs
☆204Updated this week
Alternatives and similar repositories for llguidance:
Users that are interested in llguidance are comparing it to the libraries listed below
- Faster structured generation☆205Updated last week
- TensorRT-LLM server with Structured Outputs (JSON) built with Rust☆49Updated 2 weeks ago
- Comparison of Language Model Inference Engines☆214Updated 4 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆130Updated 4 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆254Updated 9 months ago
- ☆113Updated 2 weeks ago
- Fast, Flexible and Portable Structured Generation☆888Updated last week
- Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.☆349Updated this week
- Inference server benchmarking tool☆51Updated 2 weeks ago
- Formatron empowers everyone to control the format of language models' output with minimal overhead.☆193Updated 3 months ago
- A high-performance constrained decoding engine based on context free grammar in Rust☆50Updated 3 months ago
- Long context evaluation for large language models☆207Updated last month
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆178Updated this week
- ☆128Updated 11 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 6 months ago
- ☆207Updated 2 months ago
- ☆150Updated 4 months ago
- ☆153Updated 2 weeks ago
- Fast parallel LLM inference for MLX☆181Updated 9 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆317Updated 4 months ago
- PyTorch building blocks for the OLMo ecosystem☆197Updated this week
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆139Updated 2 months ago
- code for training & evaluating Contextual Document Embedding models☆180Updated this week
- Train your own SOTA deductive reasoning model☆86Updated last month
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆136Updated 8 months ago
- XTR/WARP is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆123Updated 6 months ago
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedback☆80Updated last month
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆197Updated 9 months ago
- ☆199Updated last year
- This is our own implementation of 'Layer Selective Rank Reduction'☆235Updated 10 months ago