erikavaris/tokenizer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/erikavaris/tokenizer)

erikavaris / tokenizer

Tokenizer for Twitter and Reddit data

☆45

Alternatives and similar repositories for tokenizer

Users that are interested in tokenizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Institute-Web-Science-and-Technologies / CLEARumor
View on GitHub
CLEARumor: ConvoLving ELMo against Rumors
☆11Jul 25, 2024Updated last year
mlukasik / rumour-classification
View on GitHub
Code to reproduce experiments from the EMNLP 2015 paper about Rumour Stance Classification with Gaussian Processes.
☆37May 23, 2016Updated 10 years ago
azubiaga / pheme-twitter-conversation-collection
View on GitHub
Twitter conversation collection script, which collects all replies to a given tweet
☆69Jan 21, 2016Updated 10 years ago
jacobeisenstein / jos-gender-2014
View on GitHub
Software for the paper "Gender and Lexical Variation in Social Media" with David Bamman and Tyler Schnoebelen
☆17Nov 10, 2015Updated 10 years ago
GateNLP / broad_twitter_corpus
View on GitHub
The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016…
☆69May 12, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ahoho / SentiVAE
View on GitHub
☆15Apr 9, 2019Updated 7 years ago
zycdev / L2R2
View on GitHub
PyTorch implementation of L2R2 in SIGIR 2020
☆17Jun 12, 2023Updated 3 years ago
YunseokJANG / amc-gan
View on GitHub
☆13Jul 13, 2018Updated 8 years ago
trappmartin / BNP.jl
View on GitHub
Bayesian nonparametrics in Julia
☆10Dec 2, 2016Updated 9 years ago
hiaoxui / nugget
View on GitHub
☆11Aug 1, 2024Updated last year
sheffieldnlp / stance-semeval2016
View on GitHub
USFD submission code for Semeval 2016 Task 6, Subtask B
☆25Feb 24, 2016Updated 10 years ago
myleott / ark-twokenize-py
View on GitHub
Python port of the Twokenize class of ark-tweet-nlp
☆143May 4, 2018Updated 8 years ago
vered1986 / Chirps
View on GitHub
A Large Automatically-Constructed Resource of Predicate Paraphrases
☆45Apr 3, 2020Updated 6 years ago
wookayin / alfred-arxiv-workflow
View on GitHub
🔎 Alfred workflow to search arxiv.org items
☆25Aug 30, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
sheffieldnlp / stance-conditional
View on GitHub
Stance Detection with Conditional Encoding
☆70Jan 14, 2017Updated 9 years ago
orenmel / word2parvec
View on GitHub
A toolkit for generating paraphrase vector representations for words in context
☆23May 19, 2015Updated 11 years ago
ushahidi / suckapy
View on GitHub
The Python port of sucka.
☆20Mar 16, 2015Updated 11 years ago
UniversalDependencies / UD_French-Sequoia
View on GitHub
Data from the Sequoia treebank.
☆11May 6, 2026Updated 2 months ago
redpony / cpyp
View on GitHub
C++ library for modeling with Pitman-Yor processes
☆34Nov 28, 2017Updated 8 years ago
modestyachts / cifar-10.2
View on GitHub
Host CIFAR-10.2 Data Set
☆13Sep 22, 2021Updated 4 years ago
leondz / entity_recognition
View on GitHub
framework for doing NER and other types of entity recognition, in Python
☆68Jun 21, 2022Updated 4 years ago
Kyubyong / mtp
View on GitHub
Multi-lingual Text Processing
☆96Jan 22, 2019Updated 7 years ago
wjko2 / Linguistically-Informed-Specificity-and-Semantic-Plausibility-for-Dialogue-Generation
View on GitHub
☆10Jun 11, 2019Updated 7 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
seilna / RWMN
View on GitHub
Repository for our ICCV 2017 paper: A Read Write Network for Movie Story Understanding
☆85Apr 13, 2018Updated 8 years ago
seo-95 / MTSI-BERT
View on GitHub
Multi-Turn-Single-Intent Bert model for dialogue session classification
☆25Dec 8, 2022Updated 3 years ago
timvieira / lazygrad
View on GitHub
Lazily regularized updates for Adagrad with sparse features. Implemented in Cython for efficiency.
☆11Jan 2, 2021Updated 5 years ago
seilna / CNN-Units-in-NLP
View on GitHub
Repository for our ICLR 2019 paper: Discovery of Natural Language Concepts in Individual Units of CNNs
☆26Mar 9, 2019Updated 7 years ago
williamleif / socialsent
View on GitHub
Code and data for inducing domain-specific sentiment lexicons.
☆194Aug 2, 2024Updated last year
ibm-aur-nlp / domain-specific-QA
View on GitHub
Extracting six domain-specific QA datasets from MS MARCO
☆17Dec 1, 2019Updated 6 years ago
carpedm20 / board
View on GitHub
☆25Sep 10, 2019Updated 6 years ago
sean-chester / generalised-brown
View on GitHub
C++ implementation of Generalised Brown clustering and python scripts for feature generation
☆41Apr 8, 2016Updated 10 years ago
jtwool / TwitterGenderPredictor
View on GitHub
Python implementation of Sap et al.'s gender prediction algorithm for Twitter.
☆12Apr 7, 2018Updated 8 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
yairf11 / MUPPET
View on GitHub
Code for the paper "multi-hop paragraph retrieval for open-domain question answering"
☆36Jun 21, 2022Updated 4 years ago
aminrj-labs / mcp-attack-labs
View on GitHub
⏺ AI MCP Security Labs — hands-on exploits and defenses for Model Context Protocol tool poisoning, prompt injection, and agent
☆17Jun 12, 2026Updated last month
cbaziotis / ekphrasis
View on GitHub
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenizati…
☆675Jun 2, 2025Updated last year
jeniyat / Candidacy-Template-OSU-CSE
View on GitHub
☆10Dec 18, 2020Updated 5 years ago
mekarpeles / math.mx
View on GitHub
A comprehensive graph of mathematical domains and topics
☆24Jan 8, 2022Updated 4 years ago
mgormley / pacaya-nlp
View on GitHub
NLP Tools built with Pacaya
☆16Oct 30, 2017Updated 8 years ago
wch / vtest
View on GitHub
Visual test system for R packages
☆14Sep 24, 2015Updated 10 years ago