spapicchio / QATCH
Official implementation of QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data
β28Updated 3 weeks ago
Alternatives and similar repositories for QATCH:
Users that are interested in QATCH are comparing it to the libraries listed below
- The implementation of "RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question" [ACL 2023]β16Updated last year
- ACL2023 - AlignScore, a metric for factual consistency evaluation.β127Updated last year
- Interpretability for sequence generation models π πβ412Updated 5 months ago
- Code and model release for the paper "Task-aware Retrieval with Instructions" by Asai et al.β162Updated last year
- Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: β¦β333Updated last year
- This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.β463Updated last year
- Source Code of Paper "GPTScore: Evaluate as You Desire"β246Updated 2 years ago
- Multilingual Large Language Models Evaluation Benchmarkβ123Updated 8 months ago
- Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generationβ198Updated last year
- β174Updated 2 years ago
- Code associated with the paper "Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists"β48Updated 2 years ago
- A python package for benchmarking interpretability techniques on Transformers.β212Updated 6 months ago
- FENICE (Factuality Evaluation of Summarization based on Natural Language Inference and Claim Extraction) is a factuality-oriented metric β¦β18Updated 4 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersβ128Updated last year
- β49Updated last week
- 𦫠BEAVER: An Enterprise Benchmark for Text-to-SQLβ16Updated last month
- A framework for few-shot evaluation of autoregressive language models.β104Updated last year
- β360Updated last year
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomicβ¦β340Updated last week
- Calculate perplexity on a text with pre-trained language models. Support MLM (eg. DeBERTa), recurrent LM (eg. GPT3), and encoder-decoder β¦β155Updated 6 months ago
- contrastive decodingβ199Updated 2 years ago
- Inquisitive Parrots for Searchβ190Updated last year
- Token-level Reference-free Hallucination Detectionβ94Updated last year
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023β68Updated last year
- β73Updated last year
- Codebase, data and models for the SummaC paper in TACLβ91Updated 2 months ago
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.β137Updated 4 months ago
- Datasets collection and preprocessings framework for NLP extreme multitask learningβ180Updated 3 months ago
- Scalable training for dense retrieval models.β292Updated last month
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"β170Updated 4 months ago