spapicchio / QATCH
Official implementation of QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data
β25Updated 2 weeks ago
Related projects β
Alternatives and complementary repositories for QATCH
- β166Updated last year
- Interpretability for sequence generation models π πβ374Updated this week
- Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generationβ193Updated 9 months ago
- β332Updated 11 months ago
- A python package for benchmarking interpretability techniques on Transformers.β211Updated last month
- Source Code of Paper "GPTScore: Evaluate as You Desire"β230Updated last year
- Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"β430Updated last year
- Scalable training for dense retrieval models.β270Updated last year
- Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: β¦β322Updated last year
- Code and model release for the paper "Task-aware Retrieval with Instructions" by Asai et al.β159Updated last year
- Benchmarking library for RAGβ112Updated this week
- Multilingual Large Language Models Evaluation Benchmarkβ105Updated 2 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersβ122Updated 7 months ago
- [NAACL'24] Dataset, code and models for "TableLlama: Towards Open Large Generalist Models for Tables".β113Updated 5 months ago
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomicβ¦β289Updated 5 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"β114Updated last month
- ACL2023 - AlignScore, a metric for factual consistency evaluation.β110Updated 7 months ago
- Inquisitive Parrots for Searchβ177Updated 8 months ago
- ITALIC: An ITALian Intent Classification Datasetβ11Updated 11 months ago
- A framework for few-shot evaluation of autoregressive language models.β101Updated last year
- This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.β408Updated 8 months ago
- Calculate perplexity on a text with pre-trained language models. Support MLM (eg. DeBERTa), recurrent LM (eg. GPT3), and encoder-decoder β¦β132Updated last month
- Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"β258Updated 9 months ago
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Modelsβ467Updated 4 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"β78Updated 2 months ago
- Train Llama 2 & 3 on the SQuAD v2 task as an example of how to specialize a generalized (foundation) model.β47Updated 5 months ago
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasetsβ188Updated 2 months ago
- Use Large Language Models like OpenAI's GPT-3.5 for data annotation and model enhancement. This framework combines human expertise with Lβ¦β29Updated last year
- A Survey on Data Selection for Language Modelsβ178Updated 3 weeks ago
- RARR: Researching and Revising What Language Models Say, Using Language Modelsβ43Updated last year