qburst / common-crawl-malayalam
Useful tools to extract malayalam text from the Common Crawl Datasets
☆27Updated last month
Alternatives and similar repositories for common-crawl-malayalam:
Users that are interested in common-crawl-malayalam are comparing it to the libraries listed below
- semantically distinct key phrase extraction using hilbert hashes.☆48Updated 2 years ago
- [WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…☆15Updated last year
- ☆22Updated 3 years ago
- MoodCat😼 classifies the mood of English sentences.☆14Updated 2 years ago
- Using short models to classify long texts☆21Updated last year
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…☆37Updated 2 years ago
- ☆17Updated 2 years ago
- Topic Inference with Zeroshot models☆61Updated last year
- ☆11Updated 5 years ago
- Natural Language Generation for Gramex applications.☆24Updated 2 years ago
- ☆28Updated 4 years ago
- Tooling to play around with multilingual machine translation for Indian Languages.☆21Updated 2 years ago
- 🦖 Streamlined Recommender Systems with TensorFlow and KubeFlow☆18Updated last year
- Cortex-compatible model server for Python and TensorFlow☆17Updated 2 years ago
- 🧬 A VS Code extension for annotating data with Prodigy☆30Updated 3 years ago
- Example Flask project to use Spacy on AWS Lambda and get the models from an S3 bucket☆12Updated 2 years ago
- spaCy entry points for Curated Transformers☆26Updated 4 months ago
- Summarize. is a Streamlit application that performs automatic text summarization using both extractive and abstractive models.☆16Updated 3 years ago
- Enterprise Solution for Text Classification (using BERT)☆10Updated 2 years ago
- A word embedding and graph-based keyword extraction tool☆17Updated 5 years ago
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆37Updated 5 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated 10 months ago
- classify a job description (or noisy job title) into a ONET job title☆18Updated 8 years ago
- Material for the series of seminars on Large Language Models☆31Updated 9 months ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆76Updated last year
- Loan Risk Prediction Neural Network and API☆17Updated 4 years ago
- A utility for labeling clusters of text data.☆28Updated 3 years ago
- Just another sentiment wrapper.☆17Updated 3 years ago