aastroza / structured-generation-benchmark
Structured Generation Evals
☆11Updated last month
Related projects ⓘ
Alternatives and complementary repositories for structured-generation-benchmark
- Replicating O1 inference-time scaling laws☆48Updated last month
- ☆18Updated 3 weeks ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆61Updated 4 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆84Updated 3 months ago
- ☆74Updated last week
- Codebase accompanying the Summary of a Haystack paper.☆71Updated last month
- ☆38Updated this week
- Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State☆17Updated 10 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆40Updated 8 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆141Updated last month
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆22Updated 7 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆46Updated 2 months ago
- ☆27Updated last month
- minimal pytorch implementation of bm25 (with sparse tensors)☆88Updated 8 months ago
- ☆26Updated 4 months ago
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆36Updated last year
- Advanced Reasoning Benchmark Dataset for LLMs☆45Updated 11 months ago
- Score LLM pretraining data with classifiers☆55Updated last year
- RepoQA: Evaluating Long-Context Code Understanding☆99Updated last week
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆61Updated 4 months ago
- ☆21Updated 5 months ago
- ☆28Updated 2 weeks ago
- ☆40Updated 3 weeks ago
- Sphynx Hallucination Induction☆47Updated 3 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆73Updated 2 months ago
- A curated list of papers related to constrained decoding of LLM, along with their relevant code and resources.☆87Updated last week
- LLMs as Collaboratively Edited Knowledge Bases☆42Updated 8 months ago
- 🔗 LINC: Logical Inference via Neurosymbolic Computation [EMNLP2023]☆55Updated 10 months ago