IBM / comparing-corpora
A python library of similarity measures which allow measuring the perceptual similarity between set embeddings corpora.
☆12Updated last year
Related projects ⓘ
Alternatives and complementary repositories for comparing-corpora
- Training Temporal Word Embeddings with a Compass☆64Updated last year
- This repository implements the interaction with DBLP, information extraction and pre-processing of papers, and a client to store data to …☆10Updated last year
- A Python wrapper around the topic modeling functions of MALLET.☆99Updated 3 weeks ago
- Repository for the Tweet2Story framework for the extraction of narratives from tweets.☆13Updated 2 years ago
- ☆9Updated last year
- This repository hosts the dataset for the paper Computer Science Named Entity Recognition in the Open Research Knowledge Graph☆18Updated 10 months ago
- Train, evaluate, and use different unsupervised topic modelling algorithms using a RESTful API.☆36Updated last year
- Compute novelty indicators☆26Updated 5 months ago
- ☆34Updated 2 years ago
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 2 years ago
- Package to extract connotation frames☆80Updated 11 months ago
- Bots for reviewing the credibility of web content: articles, tweets, sentences and websites☆9Updated last year
- The CleanCoNLL dataset from our EMNLP 2023 paper where we corrected annotation errors and inconsistencies in CoNLL-03.☆19Updated 4 months ago
- A toolkit for social media information extraction using multi-task learning and active learning☆19Updated last year
- Code for the paper "Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora", ACL 2020.☆18Updated 4 years ago
- Sentence embeddings for unsupervised event detection in the Twitter stream: study on English and French corpora☆31Updated last month
- Preprocessing and analysis for training SNOMED-CT concept embeddings from CORD-19 corpus☆14Updated last year
- Blazing fast topic modelling for short texts.☆31Updated last month
- MultiCite code and data. Models are available on Huggingface.☆29Updated 2 years ago
- The official implementation of the iConference 2022 paper "Identifying Machine-Paraphrased Plagiarism".☆16Updated 2 years ago
- Get annotation suggestions for the INCEpTION text annotation platform from spaCy, Sentence BERT, scikit-learn and more. Runs as a web-ser…☆40Updated last month
- An implementation of GrASP (Shnarch et. al., 2017)☆21Updated 2 years ago
- ☆19Updated 2 years ago
- Learned string similarity for entity names using optimal transport.☆34Updated 4 years ago
- This repository contains the code used for distillation and fine-tuning of compact biomedical transformers that have been introduced in t…☆16Updated 7 months ago
- A collection of notebooks for Natural Language Processing☆24Updated this week
- Python text processing, pattern matching, and NLP framework☆63Updated last year
- Sentence specificity prediction☆25Updated 5 years ago
- Word Sense Induction with BERT MLM☆28Updated last year
- Wayward is a Python package that helps to identify characteristic terms from single documents or groups of documents. It can be used for …☆9Updated 5 years ago