peterbaile / beaverLinks
𦫠BEAVER: An Enterprise Benchmark for Text-to-SQL
β21Updated 4 months ago
Alternatives and similar repositories for beaver
Users that are interested in beaver are comparing it to the libraries listed below
Sorting:
- Please visit https://github.com/HKUSTDial/NL2SQL360 to get the official code!β10Updated last year
- Benchmarking library for RAGβ229Updated 2 months ago
- β110Updated 2 weeks ago
- Code for the paper "Understanding the Effects of Noise in Text-to-SQL: An Examination of the BIRD-Bench Benchmark".β18Updated last year
- Comprehensive benchmark for RAGβ218Updated 3 months ago
- Introduction page of a challenging text-to-SQL dataset: KaggleDBQAβ38Updated 2 years ago
- β189Updated 3 months ago
- [ICLR 2025] BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrievalβ167Updated 3 weeks ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"β201Updated 10 months ago
- Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".β202Updated 3 months ago
- π² Code for our EMNLP 2023 paper - π "Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Modeβ¦β51Updated last year
- Open-WikiTable :Dataset for Open Domain Question Answering with Complex Reasoning over Tableβ25Updated 2 years ago
- Semantic Evaluation for Text-to-SQL with Distilled Test Suitesβ298Updated last year
- UNITE: A Unified Benchmark for Text-to-SQL Evaluationβ80Updated 4 months ago
- Document Ranking with Large Language Models.β190Updated last week
- β291Updated last year
- β122Updated 2 years ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervisionβ92Updated 11 months ago
- β17Updated last year
- Repository for MuSiQue: Multi-hop Questions via Single-hop Question Composition, TACL 2022β168Updated last year
- official repository for ListT5β48Updated 7 months ago
- [ICLR 2025] InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationalesβ124Updated 8 months ago
- [NAACL'24] Dataset, code and models for "TableLlama: Towards Open Large Generalist Models for Tables".β131Updated last year
- The Universe of Evaluation. All about the evaluation for LLMs.β226Updated last year
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomicβ¦β385Updated 5 months ago
- Use contrastive learning to train a large language model (LLM) as a retrieverβ12Updated last year
- The prediction results of ChatGPT on various datasets of Text-to-SQL.β102Updated 2 years ago
- LOFT: A 1 Million+ Token Long-Context Benchmarkβ212Updated 3 months ago
- β¨β¨Latest Papers about LLM-based Evaluatorsβ30Updated last year
- Codes and packages for the paper titled Evaluating Retrieval Quality in Retrieval-Augmented Generation.β26Updated 4 months ago