A curated list of resources on document similarity measures (papers, tutorials, code, ...)
โ256Jul 13, 2022Updated 3 years ago
Alternatives and similar repositories for awesome-document-similarity
Users that are interested in awesome-document-similarity are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation, trained models and result data for the paper "Aspect-based Document Similarity for Research Papers" #COLING2020โ63Apr 30, 2024Updated last year
- ๐ Dehyphenation of broken text (mainly German), i.e., extracted from a PDFโ39Mar 8, 2022Updated 4 years ago
- โ25Mar 4, 2020Updated 6 years ago
- Course materials for "Sequencing Legal DNA: NLP for Law and Political Economy", to be taught at ETH February-May 2020โ15Aug 18, 2021Updated 4 years ago
- This repository contains various ways to calculate sentence vector similarity using NLP modelsโ198Apr 14, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling on Cloudways โข AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- State of the art complex word identification models.โ14Sep 23, 2019Updated 6 years ago
- Extract networks of entities from journalistic reportingโ49Jul 17, 2023Updated 2 years ago
- Evaluate language models using multiple choice itemsโ13Mar 6, 2026Updated last month
- โ22Feb 1, 2024Updated 2 years ago
- โ23Jun 6, 2021Updated 4 years ago
- CLUE Emotion Analysis Dataset ็ป็ฒๅบฆๆ ๆๅๆๆฐๆฎ้โ10Jan 29, 2020Updated 6 years ago
- โ11Mar 15, 2024Updated 2 years ago
- security course listโ14Sep 18, 2015Updated 10 years ago
- A Python library for defining rule-based overrides on messy dataโ18Nov 24, 2025Updated 4 months ago
- NordVPN Special Discount Offer โข AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- This is a prototype of a semi-automatic data anonymization app for German documents. โก๏ธ The project has moved to: https://gitlab.opencodeโฆโ24Mar 20, 2026Updated 3 weeks ago
- Implementation of the BLUE benchmark with Transformers.โ20Feb 16, 2024Updated 2 years ago
- An opinionated NLP research templateโ10Aug 29, 2024Updated last year
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal โฆโ32Apr 29, 2021Updated 4 years ago
- ๐ธ GlotWeb: Web Indexing for Minority Languages (WWW 2026)โ17Feb 27, 2026Updated last month
- A fast python implementation of the SimHash algorithm.โ27Oct 27, 2021Updated 4 years ago
- Python client to the INCEpTION annotation toolโ17Jun 10, 2025Updated 10 months ago
- A Test Collection of Computer Science Papers for Faceted Query by Exampleโ23Nov 28, 2021Updated 4 years ago
- Question Generation model implementation in pytorchโ12Dec 26, 2018Updated 7 years ago
- 1-Click AI Models by DigitalOcean Gradient โข AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Dataset and Code for ACL 2023 paper: "IM-TQA: A Chinese Table Question Answering Dataset with Implicit and Multi-type Table Structures". โฆโ27Aug 6, 2024Updated last year
- Next-generation Punkt sentence boundary detection with zero dependenciesโ30Nov 18, 2025Updated 4 months ago
- Basis of FragDenStaat.de's โKoalitionstrackerโโ15Jul 14, 2025Updated 8 months ago
- A library for configuring SageMaker pipelines using hierarchical configuration pattern.โ10Aug 29, 2024Updated last year
- โ30Jun 23, 2022Updated 3 years ago
- โ16Jun 14, 2024Updated last year
- Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.โ98Aug 14, 2023Updated 2 years ago
- Layerwise Relevance Visualization in Convolutional Text Graph Classifiersโ11Jun 2, 2021Updated 4 years ago
- An aspiring attempt to generate a continuous space of sentences with DenseNetโ26May 4, 2017Updated 8 years ago
- Proton VPN Special Offer - Get 70% off โข AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Use spaCy for NLP and output to the FoLiA XML format.โ12Feb 27, 2024Updated 2 years ago
- Source Code for paper "NERO: A Neural Rule Grounding Framework for Label-Efficient Relation Extraction", WWW 2020โ46May 6, 2020Updated 5 years ago
- Trials of pre-trained BERT models for the medical domain in Japanese.โ12Nov 21, 2020Updated 5 years ago
- GC4LM: A Colossal (Biased) language model for Germanโ13May 2, 2021Updated 4 years ago
- Code from the pyGuru YouTube channel. Please do not submit pull requests, they will be ignored/closed. The code in the repo needs to remaโฆโ11Jul 24, 2023Updated 2 years ago
- ๐ Python Package to reconstruct the original continuous text from PDFs with language modelsโ32Sep 8, 2023Updated 2 years ago
- Implementation of Nested Named Entity Recognition using Flairโ24Oct 29, 2021Updated 4 years ago