sophiegroenwold / AAVE_SAE_dataset
Dataset accompanying the paper "Investigating African-American Vernacular English in Transformer-Based Text Generation."
☆9Updated 2 years ago
Alternatives and similar repositories for AAVE_SAE_dataset:
Users that are interested in AAVE_SAE_dataset are comparing it to the libraries listed below
- ☆90Updated 8 months ago
- Multi-LexSum is an abstractive summarization dataset for US Civil Rights Lawsuits☆19Updated 2 years ago
- Semantically Structured Sentence Embeddings☆66Updated 3 months ago
- MultiCite code and data. Models are available on Huggingface.☆29Updated 2 years ago
- Mining Legal Arguments in Court Decisions - Data and software☆66Updated last year
- The corresponding code for our paper: "Exploring the Challenges of Open Domain Multi-Document Summarization". Do not hesitate to open an …☆32Updated last year
- ☆14Updated 4 years ago
- minimal pytorch implementation of bm25 (with sparse tensors)☆97Updated 11 months ago
- Information extraction from English and German texts based on predicate logic☆135Updated last year
- Factored Cognition Primer: How to write compositional language model programs☆48Updated last year
- SciRepEval benchmark training and evaluation scripts☆72Updated 8 months ago
- Pre-train Static Word Embeddings☆47Updated 2 weeks ago
- ☆38Updated 9 months ago
- [Added T5 support to TRLX] A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆47Updated 2 years ago
- A repository with several curated datasets of counter-narratives to fight online hate speech.☆88Updated last year
- ☆155Updated 7 months ago
- A module to compute textual lexical richness (aka lexical diversity).☆99Updated last year
- Source code and data for Like a Good Nearest Neighbor☆28Updated last month
- Repository for Zheng and Guha et al., 2021, "When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Data…☆86Updated last year
- A Dataset for Direct Quotation Extraction and Attribution in News Articles.☆13Updated 3 years ago
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆104Updated 9 months ago
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆101Updated last year
- Code for Blodgett et al. 2016, Demographic dialectal variation in social media☆25Updated 5 years ago
- ☆39Updated 3 years ago
- ☆30Updated 4 months ago
- Detecting Bias and ensuring Fairness in AI solutions☆87Updated 2 years ago
- Multidocument Summarization for Literature Review Shared Task 2022☆28Updated 2 years ago
- A dataset for pretraining language models targeted for legal tasks.☆126Updated 2 years ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆55Updated 6 months ago