defog-ai / defog-data
This repository contains the metadata and data of different databases that we use for testing
☆13Updated this week
Related projects ⓘ
Alternatives and complementary repositories for defog-data
- This project studies the performance and robustness of language models and task-adaptation methods.☆141Updated 5 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 7 months ago
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆64Updated 3 weeks ago
- Introduction page of a challenging text-to-SQL dataset: KaggleDBQA☆33Updated last year
- A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.☆167Updated 3 months ago
- Leveraging large language models for text-to-SQL synthesis, this project fine-tunes WizardLM/WizardCoder-15B-V1.0 with QLoRA on a custom …☆43Updated 11 months ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆182Updated last month
- AuditNLG: Auditing Generative AI Language Modeling for Trustworthiness☆97Updated last year
- Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data☆97Updated 3 years ago
- ☆19Updated 11 months ago
- Dense X Retrieval: What Retrieval Granularity Should We Use?☆131Updated 10 months ago
- This is the code for our KILT leaderboard submissions (KGI + Re2G models).☆148Updated last year
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆65Updated 8 months ago
- GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training☆101Updated 7 months ago
- Evaluating tool-augmented LLMs in conversation settings☆72Updated 5 months ago
- MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, …☆122Updated 2 years ago
- ☆13Updated this week
- ☆100Updated 2 months ago
- Tools for managing datasets for governance and training.☆77Updated 2 weeks ago
- Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 q…☆85Updated 8 months ago
- Convert natural language query to appropriate SQL, make ERPs cool again.☆73Updated 4 years ago
- A multilingual version of MS MARCO passage ranking dataset☆141Updated last year
- ☆97Updated 2 years ago
- Finetune mistral-7b-instruct for sentence embeddings☆70Updated 6 months ago
- This repository provides scripts for evaluating NLP models on the LEXTREME benchmark, a set of diverse multilingual tasks in legal NLP☆20Updated 10 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆114Updated last month
- A collection of task-specific NLU datasets☆145Updated 2 years ago
- ☆13Updated this week
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆101Updated 5 months ago
- [ACL24] Official repo for "Synthesizing Text-to-SQL Data from Weak and Strong LLMs"☆60Updated 3 months ago