A list of tools for annotating data, managing annotations, etc.
β612Aug 1, 2024Updated last year
Alternatives and similar repositories for awesome-data-annotation
Users that are interested in awesome-data-annotation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A curated list of awesome data labeling toolsβ4,337Jun 17, 2024Updated last year
- π§ A curated list of awesome dataset toolsβ937Jun 9, 2023Updated 3 years ago
- A curated list of awesome data annotation toolsβ223Oct 7, 2022Updated 3 years ago
- Open source annotation tool for machine learning practitioners.β10,675Apr 14, 2026Updated 2 months ago
- Label Studio is a multi-type data labeling and annotation tool with standardized output formatβ27,608Updated this week
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Label Objects and Save Time (LOST) - Design your own smart Image Annotation process in a web-based environment.β577Updated this week
- Scalabel: A versatile web-based visual data annotation toolβ665Apr 17, 2025Updated last year
- Collaborate & label any type of data, images, text, or documents, in an easy web interface or desktop app.β2,065Mar 15, 2025Updated last year
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasetsβ4,996Jun 8, 2026Updated last week
- The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Cβ¦β1,904Nov 18, 2024Updated last year
- A curated list of pretrained sentence and word embedding modelsβ2,288Apr 23, 2021Updated 5 years ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.gβ¦β113Jan 24, 2025Updated last year
- ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.β269Nov 28, 2021Updated 4 years ago
- Smarter Manual Annotation for Resource-constrained collection of Training dataβ230Dec 2, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- skweak: A software toolkit for weak supervision applied to NLP tasksβ926Sep 2, 2024Updated last year
- Computer Vision Annotation Tool (CVAT) is a leading platform for building high-quality visual datasets for vision AI. It offers open-sourβ¦β16,055Updated this week
- Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames for data analysisβ13Aug 21, 2025Updated 9 months ago
- β13Dec 4, 2017Updated 8 years ago
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in cβ¦β358Feb 22, 2022Updated 4 years ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer modelsβ63Dec 6, 2022Updated 3 years ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporatedβ¦β27Oct 4, 2022Updated 3 years ago
- π A curated list of awesome resources dedicated to Relation Extraction, one of the most important tasks in Natural Language Processing (β¦β1,225Jan 27, 2022Updated 4 years ago
- Streamlit component for TensorBoard, TensorFlow's visualization toolkitβ42Dec 8, 2021Updated 4 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learningβ20,607Jun 4, 2026Updated last week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,376Oct 27, 2025Updated 7 months ago
- This is a document concerning Data Readiness in the context of machine learning and Natural Language Processing.β13Oct 14, 2021Updated 4 years ago
- A system for quickly generating training data with weak supervisionβ5,975Jun 8, 2026Updated last week
- Extracting six domain-specific QA datasets from MS MARCOβ17Dec 1, 2019Updated 6 years ago
- logboard: Monitor and Compare Logs on Browser/Terminal.β21Sep 19, 2019Updated 6 years ago
- SSL using PyTorchβ48Jan 12, 2020Updated 6 years ago
- Open source audio annotation tool for humansβ1,139Feb 3, 2026Updated 4 months ago
- Data augmentation for NLPβ4,658Updated this week
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A curated list of NLP resources focused on Transformer networks, attention mechanism, GPT, BERT, ChatGPT, LLMs, and transfer learning.β1,145Oct 27, 2024Updated last year
- A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).β431Jan 5, 2025Updated last year
- πHMTL: Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLPβ1,195Aug 1, 2023Updated 2 years ago
- AlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging (ACL 2019 Demo)β137Jan 5, 2023Updated 3 years ago
- An open-source NLP research library, built on PyTorch.β11,892Nov 22, 2022Updated 3 years ago
- Active Learning for Text Classification in Pythonβ644May 24, 2026Updated 3 weeks ago
- Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data β¦β11,511Jan 13, 2026Updated 5 months ago