cyb3rk0tik / pyfrancLinks
Text language detection basic on trigrams.
☆16Updated 2 years ago
Alternatives and similar repositories for pyfranc
Users that are interested in pyfranc are comparing it to the libraries listed below
Sorting:
- Faster, modernized fork of the language identification tool langid.py☆60Updated last year
- Benchmark scripts for comparing different tokenizers and sentence segmenters of German☆12Updated 2 years ago
- Stuttgart Finite State Transducer system☆23Updated 5 months ago
- Targetted language identifier, based on FastText and Hunspell.☆38Updated 4 months ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆143Updated 2 months ago
- Extract dates from text☆66Updated 4 years ago
- Translate HTML using Argos Translate☆57Updated 2 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated 9 months ago
- Automatically exported from code.google.com/p/guess-language☆54Updated 2 months ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆34Updated 2 years ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆19Updated 2 years ago
- Fast Neural Machine Translation in C++ - development repository☆22Updated last year
- This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.☆126Updated last year
- Find duplicate text files.☆15Updated last year
- Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).☆53Updated last year
- Training scripts for Argos Translate☆153Updated last month
- Arquivo.pt main goal is the preservation and access of web contents that are no longer available online. During the developing of the PW…☆52Updated 2 months ago
- Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents☆12Updated 3 years ago
- Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML☆65Updated 11 months ago
- Summarize your video to any duration.☆39Updated 3 years ago
- Python Unicode Block Utilities☆24Updated 2 months ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆113Updated 11 months ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 7 years ago
- Lightning Fast Language Prediction 🚀☆167Updated 4 months ago
- Labeled segmentation for the document structure of printed books☆16Updated 8 years ago
- Boolean text search in Python☆46Updated 6 months ago
- A dataset of multinational first names and last names☆27Updated 2 years ago
- A tidy and complete archive of metadata for papers on arxiv.org, 1993-2019☆28Updated 6 years ago
- Deeplearing based Reverse Image Search using Annoy library☆15Updated 6 years ago
- Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your com…☆134Updated 2 months ago