cyb3rk0tik / pyfranc
Text language detection basic on trigrams.
☆13Updated last year
Alternatives and similar repositories for pyfranc:
Users that are interested in pyfranc are comparing it to the libraries listed below
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated last year
- Scrape Hacker News replies☆26Updated 3 years ago
- Faster, modernized fork of the language identification tool langid.py☆55Updated 4 months ago
- A text processing tool including tag(HTML, URL, Email) extraction and removing, punctuation normalization, simple segmentation, and so on…☆11Updated 3 months ago
- ☆14Updated last year
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- Using Machine Learning to Create Funny Memes☆25Updated 2 years ago
- Tools for scraping YouTube video metadata (mostly for training AI on video titles)☆40Updated 3 years ago
- ☆17Updated last week
- Datamallet is a python library which contains several helper functions and module for the common tasks in a typical data science workflow…☆11Updated 2 years ago
- Polyglot skipgram embeddings, and their many health benefits☆12Updated 5 years ago
- The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques☆29Updated 4 years ago
- Guess the Hacker News titles☆11Updated 3 years ago
- Tools for encoding Magic: The Gathering cards into a form suitable for AI text generation☆19Updated 3 years ago
- Automatically exported from code.google.com/p/guess-language☆53Updated last year
- Tools for building SQLite databases from files and directories☆12Updated last year
- Generate variations of text through synonym matching☆12Updated 7 years ago
- Async bulk data ingestion and querying in various document, graph and vector databases via their Python clients☆36Updated last year
- This repository provides various Python methods for finding and aggregating synonyms for an individual word or a list of words.☆34Updated 2 years ago
- Functional composable pipelines allowing clean separation of the business logic and its implementation☆11Updated 10 months ago
- Match celebrity users with their respective tweets by making use of Semantic Textual Similarity on over 900+ celebrity users' 2.5 million…☆13Updated last year
- Stock market analyzer built on Tweepy, Elasticsearch and NLTK☆12Updated 3 years ago
- An awesome list of gpt-3 experiments + outputted newsletter.☆17Updated 4 years ago
- A CLI tool for managing OpenAI batch processing jobs with ease.☆34Updated 7 months ago
- Boolean text search in Python☆45Updated 2 years ago
- MirrorDataGenerator is a python tool that generates synthetic data based on user-specified causal relations among features in the data. I…☆21Updated 2 years ago
- Code that accompanies the PyData New York (2022) talk: Addressing the sensitivity of Large language models☆13Updated 2 years ago
- An easy way to use the released TransCoder by Facebook AI Research to convert code from one programming language to another using unsuper…☆23Updated 4 years ago
- Natural Language Processing Project☆10Updated 3 years ago
- A scraping Master-slave system based on Google App Engine☆11Updated 4 years ago