qburst / common-crawl-malayalamLinks
Useful tools to extract malayalam text from the Common Crawl Datasets
☆28Updated 8 months ago
Alternatives and similar repositories for common-crawl-malayalam
Users that are interested in common-crawl-malayalam are comparing it to the libraries listed below
Sorting:
- semantically distinct key phrase extraction using hilbert hashes.☆50Updated 3 years ago
- [WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…☆15Updated 2 years ago
- Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine☆242Updated 2 years ago
- Topic Inference with Zeroshot models☆61Updated 2 years ago
- A comprehensive tool for linguistic analysis of communities☆49Updated 3 years ago
- Automatically check mismatch between code and comments using AI and ML☆53Updated 4 years ago
- Text classification automl☆21Updated 4 years ago
- Framework for building and maintaining self-updating prompts for LLMs☆64Updated last year
- NeatText a simple NLP package for cleaning textual data and text preprocessing☆72Updated last year
- ☆47Updated 2 years ago
- NLP tool to extract emotional phrase from tweets 🤩☆40Updated 3 years ago
- ☆57Updated 3 years ago
- Language detection using Spacy and Fasttext☆57Updated last year
- Conversational text Analysis using various NLP techniques☆181Updated 2 years ago
- A comprehensive reference for all topics related to building and maintaining microservices☆67Updated 2 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆38Updated 6 years ago
- Question Generation - Question Answering for Automatic Flashcards☆66Updated 3 years ago
- A utility for labeling clusters of text data.☆28Updated 4 years ago
- Information extraction from English and German texts based on predicate logic☆138Updated 2 years ago
- Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.☆92Updated 3 years ago
- Interactive tree-maps with SBERT & Hierarchical Clustering (HAC)☆30Updated 8 months ago
- Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).☆73Updated last year
- A curated list of ML awesome frameworks & libraries for text data☆16Updated 2 years ago
- ☆43Updated 2 years ago
- A library to synthesize text datasets using Large Language Models (LLM)☆152Updated 2 years ago
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…☆37Updated 3 years ago
- Tooling to play around with multilingual machine translation for Indian Languages.☆22Updated 3 years ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆80Updated 2 years ago
- Deploy transformers serverless on AWS Lambda☆122Updated 4 years ago
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆35Updated 2 years ago