peterbaile / beaverLinks
𦫠BEAVER: An Enterprise Benchmark for Text-to-SQL
β21Updated 5 months ago
Alternatives and similar repositories for beaver
Users that are interested in beaver are comparing it to the libraries listed below
Sorting:
- β116Updated last month
- Benchmarking library for RAGβ235Updated 3 weeks ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"β205Updated 10 months ago
- Comprehensive benchmark for RAGβ231Updated 4 months ago
- β189Updated 3 months ago
- Code for the paper "Understanding the Effects of Noise in Text-to-SQL: An Examination of the BIRD-Bench Benchmark".β18Updated last year
- Please visit https://github.com/HKUSTDial/NL2SQL360 to get the official code!β10Updated last year
- β292Updated last year
- UNITE: A Unified Benchmark for Text-to-SQL Evaluationβ82Updated 5 months ago
- Semantic Evaluation for Text-to-SQL with Distilled Test Suitesβ301Updated last year
- [ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically dβ¦β306Updated last year
- [EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627β497Updated last year
- Document Ranking with Large Language Models.β191Updated last month
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomicβ¦β394Updated 6 months ago
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuningβ248Updated 2 years ago
- Introduction page of a challenging text-to-SQL dataset: KaggleDBQAβ39Updated 2 years ago
- Repository for Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions, ACL23β236Updated last year
- The prediction results of ChatGPT on various datasets of Text-to-SQL.β102Updated 2 years ago
- [ICLR 2025] BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrievalβ169Updated last month
- Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".β205Updated 4 months ago
- A large-scale information-rich web dataset, featuring millions of real clicked query-document labelsβ345Updated 10 months ago
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Setsβ218Updated last year
- Code and data for "Lost in the Middle: How Language Models Use Long Contexts"β361Updated last year
- RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation.β140Updated 5 months ago
- Source Code of Paper "GPTScore: Evaluate as You Desire"β257Updated 2 years ago
- Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"β388Updated last year
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.β545Updated this week
- [NAACL'24] Dataset, code and models for "TableLlama: Towards Open Large Generalist Models for Tables".