qburst / common-crawl-malayalam
Useful tools to extract malayalam text from the Common Crawl Datasets
☆27Updated this week
Related projects ⓘ
Alternatives and complementary repositories for common-crawl-malayalam
- ☆11Updated 4 years ago
- ☆28Updated 4 years ago
- [WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…☆15Updated last year
- Tooling to play around with multilingual machine translation for Indian Languages.☆21Updated 2 years ago
- Automatically check mismatch between code and comments using AI and ML☆54Updated 3 years ago
- Topic Inference with Zeroshot models☆61Updated last year
- Natural Language Generation for Gramex applications.☆24Updated 2 years ago
- Package that returns a company embedding given a company name☆42Updated 4 years ago
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆42Updated 5 years ago
- semantically distinct key phrase extraction using hilbert hashes.☆48Updated 2 years ago
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…☆37Updated 2 years ago
- A News Article Collection Library☆22Updated last year
- Question Generation - Question Answering for Automatic Flashcards☆64Updated 2 years ago
- NeatText a simple NLP package for cleaning textual data and text preprocessing☆68Updated 11 months ago
- ☆29Updated last year
- ☆16Updated last year
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆62Updated 8 months ago
- State of the art open-source translation for Indic languages.☆4Updated 3 years ago
- Pyinfer is a model agnostic tool for ML developers and researchers to benchmark the inference statistics for machine learning models or f…☆24Updated 3 years ago
- ☆17Updated last year
- simple rule based named entity recognition☆43Updated 2 years ago
- ☆31Updated 5 years ago
- Generating boolean (yes/no) questions from any content using T5 text-to-text transformer model and BoolQ dataset☆35Updated last year
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆37Updated 5 years ago
- NLP tool to extract emotional phrase from tweets 🤩☆40Updated 3 years ago
- ☆13Updated 3 years ago
- A curated list of ML awesome frameworks & libraries for text data☆16Updated last year
- Preprocessing and analysis for training SNOMED-CT concept embeddings from CORD-19 corpus☆14Updated last year
- A PyPI package for easy text annotation in a Jupyter Notebook.☆28Updated 3 years ago