Setu is a comprehensive pipeline designed to clean, filter, and deduplicate diverse data sources including Web, PDF, and Speech data. Built on Apache Spark, Setu encompasses four key stages: document preparation, document cleaning and analysis, flagging and filtering, and deduplication.
☆16May 17, 2024Updated last year
Alternatives and similar repositories for setu
Users that are interested in setu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- bulk image downloader freeware, reddit bulk image downloader, bulk image downloader extension, bulk image downloader from url, bulk image…☆25Feb 19, 2026Updated last month
- A blueprint for creating Pretraining and Fine-Tuning datasets for Indic languages☆397Oct 7, 2024Updated last year
- Transcripts for various Youtube Channels inspired by https://karpathy.ai/lexicap/index.html☆16Nov 14, 2025Updated 5 months ago
- A swarm of LLM agents that will help you test, document, and productionize your code!☆16Mar 30, 2026Updated 2 weeks ago
- Golang news aggregator mobile application written in React Native (source:www.golangnews.com)☆13Updated this week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- 🔬Writing reliable & fault-tolerant microservices with https://nats.io☆16Mar 27, 2018Updated 8 years ago
- Библиотека для работы с API брокеров бинарных опционов☆15Jun 7, 2021Updated 4 years ago
- Binance-API is a fast and lightweight Golang implementation for Binance API, providing complete API coverage, and supports both REST API,…☆10Aug 29, 2023Updated 2 years ago
- Doccano annotation server together with a Spacy backend☆11Apr 5, 2023Updated 3 years ago
- A Background Task Manager built on top of NATS.io☆12Mar 24, 2026Updated 3 weeks ago
- go-active-learning is a command line annotation tool for binary classification problem written in Go.☆15Apr 3, 2021Updated 5 years ago
- High Performance Go-Fiber Microservice to Track UniswapV3 Liquidity Pools☆15Oct 7, 2023Updated 2 years ago
- The official evaluation suite and dynamic data release for MixEval.☆11Sep 23, 2024Updated last year
- A Go client to receive real-time data messages from Polymarket☆13Jun 25, 2025Updated 9 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Declarative environment variable validation for Go.☆14Updated this week
- Automated social media post sharing☆11Jan 5, 2022Updated 4 years ago
- The Complete Blockchain Professional Course, published by Packt☆11Jan 30, 2023Updated 3 years ago
- zmq and nanomsg in pure Go (Golang)☆35Nov 6, 2013Updated 12 years ago
- ☆12Mar 20, 2023Updated 3 years ago
- Automated content cross posting from Notion Database to Dev.to, Hashnode, Medium, Twitter, and LinkedIn using GitHub Actions.☆13Oct 21, 2024Updated last year
- End to End chatbot implemented in PyTorch☆12Apr 26, 2020Updated 5 years ago
- A simple crypto Telegram Bot based on chart-img.com API. The bot replies to the user with a screenshot of crypto market charts.☆11Mar 31, 2023Updated 3 years ago
- ☆14May 25, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Orderbook implementation in go with red black tree☆14Sep 4, 2018Updated 7 years ago
- Viewer for text datasets in formats like HuggingFace, JSONL, etc.☆15Feb 25, 2025Updated last year
- Check for a set of proxies different conditions, is the proxy working, does the proxy bypass cloudflare and so on.☆13Mar 8, 2020Updated 6 years ago
- link archive for year 2024☆18May 7, 2025Updated 11 months ago
- Task management for AI agents☆15Jun 25, 2025Updated 9 months ago
- The template to pack your Flask + Vue web app.☆10Mar 6, 2023Updated 3 years ago
- The format that's super!☆36Mar 17, 2026Updated 3 weeks ago
- A curated collection of 650+ AI tools for productivity, creativity, and innovation. Contribute via pull requests to join the community! E…☆15Jun 25, 2025Updated 9 months ago
- One command automated macOS/Linux laptop/VM/container bootstrapper.☆18Apr 8, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Natural Russian Language Processing by the Keys☆12May 27, 2020Updated 5 years ago
- Golang implementation Aviasales API for data access☆10Apr 15, 2017Updated 9 years ago
- A simple agent powered by LLMs that performs tasks.☆14Apr 25, 2025Updated 11 months ago
- Building a multi-agent RAG system with advanced RAG methods☆12Jan 12, 2025Updated last year
- Auto-generated sphinx version of the IPython website. Since this is an auto-generated directory, do *not* submit pull requests against th…☆11Jan 3, 2026Updated 3 months ago
- Text Summarization using Transformer on GPU Docker Deployment☆15Jul 26, 2022Updated 3 years ago
- rUv-Engineer - let's you describe UI using your imagination, then see it rendered live.☆12Sep 28, 2024Updated last year