aastroza / structured-generation-benchmarkLinks
Structured Generation Evals
☆12Updated last year
Alternatives and similar repositories for structured-generation-benchmark
Users that are interested in structured-generation-benchmark are comparing it to the libraries listed below
Sorting:
- ☆29Updated last week
 - Benchmark structured generation libraries☆29Updated last year
 - minimal pytorch implementation of bm25 (with sparse tensors)☆104Updated this week
 - ☆80Updated 2 weeks ago
 - ☆103Updated 9 months ago
 - Probabilistic LLM evaluations. [CogSci2023; ACL2023]☆72Updated last year
 - Extract full next-token probabilities via language model APIs☆247Updated last year
 - Composable inference algorithms with LLMs and programmable logic☆69Updated 10 months ago
 - An attribution library for LLMs☆43Updated last year
 - An introduction to LLM Sampling☆79Updated 10 months ago
 - [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆99Updated 11 months ago
 - A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 6 months ago
 - Experiments for efforts to train a new and improved t5☆75Updated last year
 - The official code repo for "Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations".☆83Updated last year
 - ☆73Updated 2 months ago
 - Storing long contexts in tiny caches with self-study☆205Updated 2 weeks ago
 - Fast, High-Fidelity LLM Decoding with Regex Constraints☆20Updated last year
 - Training code for Sparse Autoencoders on Embedding models☆38Updated 8 months ago
 - A domain-specific probabilistic programming language for modeling and inference with language models☆136Updated 6 months ago
 - Evaluating LLMs with fewer examples☆164Updated last year
 - ☆83Updated 3 months ago
 - ☆50Updated 8 months ago
 - ☆50Updated last year
 - code for training & evaluating Contextual Document Embedding models☆199Updated 5 months ago
 - Using open source LLMs to build synthetic datasets for direct preference optimization☆68Updated last year
 - Advanced Reasoning Benchmark Dataset for LLMs☆46Updated last year
 - 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆102Updated last year
 - ☆58Updated last year
 - An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆145Updated 8 months ago
 - Tools to make language models a bit easier to use☆54Updated last month