epfl-dlab / homepage2vecLinks
Language-Agnostic Website Embedding and Classification
☆46Updated 2 years ago
Alternatives and similar repositories for homepage2vec
Users that are interested in homepage2vec are comparing it to the libraries listed below
Sorting:
- potato: portable text annotation tool☆364Updated this week
- Mapping Wikipedia pages to Wikidata IDs and vice versa.☆172Updated 2 years ago
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further lang…☆197Updated 3 years ago
- Interpretable Evaluation for AI Systems☆366Updated 2 years ago
- This is a repository of the study performed under the Adversarial Paraphrasing Task (APT).☆25Updated 4 years ago
- A module to compute textual lexical richness (aka lexical diversity).☆112Updated 2 years ago
- Active Learning for Text Classification in Python☆638Updated last week
- An open-source text summarization toolkit for non-experts. EMNLP'2021 Demo☆280Updated 2 years ago
- Repository for XLM-T, a framework for evaluating multilingual language models on Twitter data☆160Updated 3 years ago
- The official code for PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization☆157Updated 3 years ago
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.☆31Updated 2 years ago
- The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations". Do not hesitate to o…☆379Updated 2 years ago
- This repository contains the code for "Generating Datasets with Pretrained Language Models".☆189Updated 4 years ago
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆297Updated last year
- A Framework for Textual Entailment based Zero Shot text classification☆153Updated last year
- A set of Python scripts for preprocessing the Wikidata JSON dump and running simple queries in an efficient manner.☆141Updated last year
- Repository for Zheng and Guha et al., 2021, "When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Data…☆95Updated 2 years ago
- Search Engines with Autoregressive Language models☆295Updated 2 years ago
- Concept Modeling: Topic Modeling on Images and Text☆217Updated last year
- Official code and data repository for our EMNLP 2020 long paper "Reformulating Unsupervised Style Transfer as Paraphrase Generation" (htt…☆240Updated 3 years ago
- Source codes for the paper "Examining the Ordering of Rhetorical Strategies in Persuasive Requests"☆18Updated 4 years ago
- Set of vectorizers that extract keyphrases with part-of-speech patterns from a collection of text documents and convert them into a docum…☆266Updated last year
- This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs☆188Updated 2 years ago
- Code and model checkpoints for the MultiVerS model for scientific claim verification.☆52Updated 2 years ago
- Text span utilities for Rust and Python☆22Updated 3 years ago
- A multilingual version of MS MARCO passage ranking dataset☆147Updated 2 years ago
- TimeLMs: Diachronic Language Models from Twitter☆112Updated last year
- Entity Disambiguation as text extraction (ACL 2022)☆182Updated 3 years ago
- Implementation of the ClausIE information extraction system for python+spacy☆227Updated 3 years ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated last year