Python scripts to read a Portuguese Wikipedia XML dump file, parse it and generate plain text files.
☆14Mar 12, 2014Updated 11 years ago
Alternatives and similar repositories for ptwiki2text
Users that are interested in ptwiki2text are comparing it to the libraries listed below
Sorting:
- ☆15Mar 2, 2014Updated 11 years ago
- Maltparser trained with the Universal Dependency Treebank for Brazilian-Portuguese Language☆12May 25, 2015Updated 10 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆25Sep 14, 2016Updated 9 years ago
- Handle linguistic corpus and convert it to use NLP tools☆21Jul 5, 2013Updated 12 years ago
- Any contributions to the NLTK project☆29May 8, 2014Updated 11 years ago
- A library that adds some NLP capabilities to the Lucene search engine☆50Jul 16, 2013Updated 12 years ago
- doc and model for NDSB☆31Apr 15, 2015Updated 10 years ago
- A framework, data and configs for generating and building Tesseract OCR lang.traineddata model files, specifically for Japanese☆10Dec 9, 2013Updated 12 years ago
- This is not the official kaldi repository. It is better to fork https://github.com/kaldi-asr/kaldi or https://github.com/vimalmanohar/kal…☆33Aug 6, 2015Updated 10 years ago
- maximum entropy based part-of-speech tagger for NLTK☆45Dec 8, 2016Updated 9 years ago
- Miscellaneous materials for teaching NLP using NLTK☆36Dec 31, 2017Updated 8 years ago
- ☆10Jun 4, 2020Updated 5 years ago
- Redis tcp map for postfix☆12Jun 28, 2024Updated last year
- Focused Crawler for VT's CTRNet☆10May 13, 2013Updated 12 years ago
- Madek main web interface☆21Updated this week
- Automatic Detection of Potentially Idiomatic Expressions☆12Feb 19, 2021Updated 5 years ago
- Speech ANDroid Apps☆20Jan 22, 2014Updated 12 years ago
- Simple CORPORA list crawler☆10Dec 2, 2016Updated 9 years ago
- (Labeled) Latent Dirichlet Allocation on a sentence level with Gibbs Sampling☆10Mar 27, 2014Updated 11 years ago
- "Save as DAISY" add-in for Microsoft Word☆10Dec 22, 2025Updated 2 months ago
- Grecka is a python script to convert Greek to Greeklish based on ELOT 743☆12Aug 4, 2018Updated 7 years ago
- Social Context Analysis aNd Emotion Recognition☆12Jul 11, 2017Updated 8 years ago
- snf-image is a Ganeti OS definition. It allows Ganeti to launch instances from predefined or untrusted custom Images. The whole process o…☆12Feb 27, 2018Updated 8 years ago
- Grapheme to phoneme converter for Estonian☆14May 27, 2021Updated 4 years ago
- A Webpack boilerplate with ES6 and SCSS for simple web projects.☆11Oct 27, 2016Updated 9 years ago
- A Scala Swing component that wraps javax.swing.JTree☆15Feb 4, 2013Updated 13 years ago
- Automated svn2git mirror of include-what-you-use: link goes to upstream☆13May 27, 2015Updated 10 years ago
- A plug-in architecture for extending Siri virtual assistant☆29Mar 30, 2014Updated 11 years ago
- A duplicate data detector engine PoC based on Elasticsearch.☆20Apr 3, 2015Updated 10 years ago
- Declarative unit testing for Answer Set Programming projects☆12Mar 4, 2018Updated 7 years ago
- Backup tool for Apache Cassandra based on https://github.com/synack/tablesnap☆22Mar 26, 2013Updated 12 years ago
- Python interface for the Berkeley Parser using JPype☆12Dec 18, 2015Updated 10 years ago
- Brand disambiguator for tweets to differentiate e.g. Orange vs orange (brand vs foodstuff), using NLTK and scikit-learn☆58Jul 11, 2013Updated 12 years ago
- Sending whispers across the interstellar space!☆11Aug 11, 2019Updated 6 years ago
- Solarized style for Qt Creator's syntax highlighter☆31Aug 22, 2016Updated 9 years ago
- Demo project for Continuous Integration - from the book Continuous Integration (Duvall, et. al)☆23Jun 19, 2020Updated 5 years ago
- A windows dll call hellper☆14Dec 19, 2014Updated 11 years ago
- Demonstrating technical elements in support of open source securitisation frameworks☆14Sep 5, 2024Updated last year
- Cproto generates function prototypes and variable declarations from C source code. Cproto can also convert function definitions between t…☆10Jul 19, 2016Updated 9 years ago