gersteinlab / Struc-BenchLinks

[NAACL 2024] Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data? https://aclanthology.org/2024.naacl-short.2/

☆54

Alternatives and similar repositories for Struc-Bench

Users that are interested in Struc-Bench are comparing it to the libraries listed below

Sorting:

Anni-Zou / Meta-CoT
Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models
☆97Updated last year
abhika-m / FAVA
☆73Updated last year
OSU-NLP-Group / Middleware
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)
☆37Updated 7 months ago
Edward-Sun / RECITE
Code of ICLR paper: https://openreview.net/forum?id=-cqvvvb-NkI
☆94Updated 2 years ago
icip-cas / SelfRetrieval
☆35Updated 8 months ago
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆81Updated 11 months ago
ytyz1307zzh / RefAug
Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"
☆55Updated 10 months ago
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆86Updated last year
allenai / super-benchmark
☆45Updated 4 months ago
GasolSun36 / Iter-CoT
[NAACL 2024] Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models
☆85Updated last year
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆79Updated 10 months ago
SALT-NLP / demonstrated-feedback
☆125Updated 10 months ago
para-lost / ReBase
ReBase: Training Task Experts through Retrieval Based Distillation
☆29Updated 6 months ago
OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Updated last year
yueyu1030 / AttrPrompt
[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.
☆152Updated last year
DAMO-NLP-SG / CaRing
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs
☆38Updated last year
oriyor / reasoning-on-cots
Implementation of the paper: "Answering Questions by Meta-Reasoning over Multiple Chains of Thought"
☆96Updated last year
cambridgeltl / PairS
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)
☆47Updated 6 months ago
TIGER-AI-Lab / StructLM
Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)
☆75Updated 9 months ago
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆68Updated 3 months ago
kaistAI / Janus
[NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages
☆49Updated 8 months ago
xingyaoww / LeTI
Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."
☆65Updated 2 years ago
casmlab / NPHardEval
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆57Updated last year
belindal / ERASE
Code and Data for "Language Modeling with Editable External Knowledge"
☆34Updated last year
nlp-uoregon / ullme
☆20Updated 3 months ago
OSU-NLP-Group / In-Context-Reranking
[ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"
☆28Updated 4 months ago
GAIR-NLP / scaleeval
Scalable Meta-Evaluation of LLMs as Evaluators
☆42Updated last year
wang-research-lab / agentinstruct
Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"
☆115Updated 10 months ago
allenai / marg-reviewer
Code/data for MARG (multi-agent review generation)
☆46Updated 8 months ago
TIGER-AI-Lab / MAmmoTH2
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆146Updated 9 months ago