qburst / common-crawl-malayalamLinks
Useful tools to extract malayalam text from the Common Crawl Datasets
☆28Updated 10 months ago
Alternatives and similar repositories for common-crawl-malayalam
Users that are interested in common-crawl-malayalam are comparing it to the libraries listed below
Sorting:
- semantically distinct key phrase extraction using hilbert hashes.☆50Updated 3 years ago
- Topic Inference with Zeroshot models☆61Updated 2 years ago
- NeatText a simple NLP package for cleaning textual data and text preprocessing☆72Updated last year
- A comprehensive tool for linguistic analysis of communities☆49Updated 4 years ago
- [WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…☆15Updated 2 years ago
- Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine☆243Updated 2 years ago
- A utility for labeling clusters of text data.☆28Updated 4 years ago
- ☆47Updated 2 years ago
- Information extraction from English and German texts based on predicate logic☆139Updated 2 years ago
- Tooling to play around with multilingual machine translation for Indian Languages.☆22Updated 3 years ago
- A curated list of ML awesome frameworks & libraries for text data☆16Updated 2 years ago
- Automatically check mismatch between code and comments using AI and ML☆54Updated 4 years ago
- Language detection using Spacy and Fasttext☆57Updated last year
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- A PyPI package for easy text annotation in a Jupyter Notebook.☆28Updated 4 years ago
- ☆43Updated 2 years ago
- Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.☆92Updated 3 years ago
- Question Generation - Question Answering for Automatic Flashcards☆66Updated 3 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆40Updated 6 years ago
- Custom Natural Language Processing with big and small models 🌲🌱☆66Updated 4 years ago
- Text classification automl☆21Updated 4 years ago
- ☆57Updated 3 years ago
- ☆30Updated 3 years ago
- Interactive tree-maps with SBERT & Hierarchical Clustering (HAC)☆30Updated 10 months ago
- Asent is a python library for performing efficient and transparent sentiment analysis using spaCy.☆120Updated 2 weeks ago
- 🌸 Train floret vectors☆18Updated 2 years ago
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…☆37Updated 3 years ago
- This is the second part of the Deep Learning Course for the Master in High-Performance Computing (SISSA/ICTP).)☆33Updated 5 years ago
- ☆55Updated last year
- Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).☆76Updated last week