cbaziotis / ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
β661Updated 8 months ago
Related projects β
Alternatives and complementary repositories for ekphrasis
- π₯ Use the latest Stanza (StanfordNLP) research models directly in spaCyβ723Updated 2 months ago
- semi supervised guided topic model with custom guidedLDAβ499Updated 4 years ago
- Python Keyphrase Extraction moduleβ1,562Updated last year
- End-to-end Neural Coreference Resolutionβ524Updated 2 years ago
- Compute Sentence Embeddings Fast!β618Updated last year
- EmbedRank: Unsupervised Keyphrase Extraction using Sentence Embeddings (official implementation)β432Updated last year
- Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classifβ¦β306Updated 4 years ago
- Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentimenβ¦β196Updated 6 years ago
- PyTorch deep learning models for document classificationβ595Updated last year
- Calculates Word Mover's Distance Insanely Fastβ460Updated last year
- Topic Modeling in Embedding Spacesβ541Updated last year
- GSDMM: Short text clusteringβ353Updated last year
- BERT for Coreference Resolutionβ445Updated last year
- πΈ Use pretrained transformers like BERT, XLNet and GPT-2 in spaCyβ1,351Updated 5 months ago
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)β1,184Updated last month
- sentence embedding by Smooth Inverse Frequency weighting schemeβ1,084Updated 5 years ago
- Semantic Text Similarity Dataset Hubβ715Updated 6 years ago
- General purpose unsupervised sentence representationsβ1,192Updated 2 years ago
- Datasets to train supervised classifiers for Named-Entity Recognition in different languages (Portuguese, German, Dutch, French, English)β338Updated 2 years ago
- Super easy library for BERT based NLP modelsβ1,863Updated 2 months ago
- A framework to learn cross-lingual word embedding mappingsβ645Updated last year
- A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of langβ¦β1,505Updated 4 months ago
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, anβ¦β555Updated 2 years ago
- Sentence paraphrase generation at the sentence levelβ407Updated last year
- Data repository for pretrained NLP models and NLP corpora.β983Updated 6 years ago
- Neat (Neural Attention) Vision, is a visualization tool for the attention mechanisms of deep-learning models for Natural Language Processβ¦β250Updated 6 years ago
- Text Similarityβ404Updated 4 years ago
- Package for evaluating word embeddingsβ436Updated 3 years ago
- BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)β574Updated 3 months ago
- An elaborate and exhaustive paper list for Named Entity Recognition (NER)β394Updated 2 years ago