cardiffnlp / timelms
TimeLMs: Diachronic Language Models from Twitter
☆100Updated 6 months ago
Related projects: ⓘ
- The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.☆35Updated 2 years ago
- ☆73Updated 3 years ago
- Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging☆65Updated 2 years ago
- This repository contains materials for the SIGIR 2022 tutorial on opinion summarization.☆34Updated 2 years ago
- Collection of NLP model explanations and accompanying analysis tools☆143Updated last year
- Apps built using Inspired Cognition's Critique.☆58Updated last year
- A Python library aimed at dissecting and augmenting NER training data.☆56Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆91Updated last year
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆46Updated 3 years ago
- Framework for unified summarisation and evaluation of English documents using state-of-the-art models and measures.☆30Updated 4 months ago
- Explainable Zero-Shot Topic Extraction☆62Updated last month
- Google's BigBird (Jax/Flax & PyTorch) @ 🤗Transformers☆47Updated last year
- Semantically Structured Sentence Embeddings☆65Updated 10 months ago
- A library to synthesize text datasets using Large Language Models (LLM)☆151Updated last year
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆149Updated 3 months ago
- A curated list of awesome datasets with human label variation (un-aggregated labels) in Natural Language Processing and Computer Vision, …☆74Updated 5 months ago
- [LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweeban…☆101Updated 7 months ago
- ☆82Updated 3 weeks ago
- Dataset from the paper "Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering" (COLING 2022)☆102Updated last year
- 💫 SpaCy wrapper for ConceptNet 💫☆88Updated last year
- A monolingual and cross-lingual meta-embedding generation and evaluation framework☆79Updated 2 years ago
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆85Updated 2 months ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…☆39Updated 2 years ago
- ☆24Updated 8 months ago
- [DEPRECATED] Adapt Transformer-based language models to new text domains☆86Updated 6 months ago
- Creating class-based TF-IDF matrices☆81Updated last year
- Code & Data for Comparative Opinion Summarization via Collaborative Decoding (Iso et al; Findings of ACL 2022)☆21Updated last year
- ☆22Updated last year
- ☆142Updated 3 months ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆63Updated last year