cyb3rk0tik / pyfranc
Text language detection basic on trigrams.
☆13Updated last year
Alternatives and similar repositories for pyfranc:
Users that are interested in pyfranc are comparing it to the libraries listed below
- Meme generator in Bash☆21Updated last year
- code and data used to build a training dataset for dragnet models☆10Updated 4 years ago
- Experiments with Hugging Face 🔬 🤗☆44Updated 8 months ago
- Targetted language identifier, based on FastText and Hunspell.☆34Updated 2 months ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated last year
- Extracts plain text, language identification and more metadata from WARC records☆21Updated last month
- Loadable spellfix1 extension for sqlite as python package☆26Updated last year
- Website and documentation☆20Updated 4 months ago
- Scripts for building a geo-located web corpus using Common Crawl data☆11Updated last week
- Deeplearing based Reverse Image Search using Annoy library☆17Updated 6 years ago
- ☆14Updated 2 years ago
- Tools for encoding Magic: The Gathering cards into a form suitable for AI text generation☆19Updated 3 years ago
- Fast Neural Machine Translation in C++ - development repository☆19Updated 11 months ago
- Keyword extraction with spaCy☆31Updated 3 years ago
- Faster, modernized fork of the language identification tool langid.py☆55Updated 5 months ago
- Tool to generate paraphrases of sentences in many languages.☆84Updated 3 years ago
- Efficient teacher-student models and scripts to make them☆50Updated last year
- Automatically exported from code.google.com/p/guess-language☆53Updated last year
- Extract knowledge from raw text☆13Updated 3 years ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀☆32Updated 3 years ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆35Updated last year
- XAI based human-in-the-loop framework for automatic rule-learning.☆48Updated 9 months ago
- Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more☆20Updated 6 years ago
- Text classification automl☆21Updated 3 years ago
- A CLI tool for managing OpenAI batch processing jobs with ease.☆35Updated 8 months ago
- Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML☆63Updated 3 months ago
- Transformer-based approaches for an efficient docstrings generation on a piece of Python's code.☆16Updated 4 years ago
- An index data structure for approximate string search.☆23Updated 5 years ago