π€ The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
β21,352Mar 25, 2026Updated last week
Alternatives and similar repositories for datasets
Users that are interested in datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β158,637Updated this week
- π₯ Fast State-of-the-Art Tokenizers optimized for Research and Productionβ10,597Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,587Mar 23, 2026Updated 2 weeks ago
- Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.β30,990Updated this week
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β32,202Sep 30, 2025Updated 6 months ago
- End-to-end encrypted email - Proton Mail β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- π€ Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.β33,224Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β41,977Updated this week
- Build and share delightful machine learning apps, all in Python. π Star to support our work!β42,231Updated this week
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and moreβ35,311Updated this week
- π« Industrial-strength Natural Language Processing (NLP) in Pythonβ33,409Mar 28, 2026Updated last week
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,348Updated this week
- Tensors and Dynamic neural networks in Python with strong GPU accelerationβ98,800Updated this week
- An open-source NLP research library, built on PyTorch.β11,893Nov 22, 2022Updated 3 years ago
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β20,895Updated this week
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- State-of-the-Art Text Embeddingsβ18,494Updated this week
- π Papers & tech blogs by companies sharing their work on data science & machine learning in production.β28,758Jul 18, 2024Updated last year
- Google Researchβ37,626Updated this week
- βοΈ Build multimodal AI applications with cloud-native stackβ21,866Mar 24, 2025Updated last year
- Streamlit β A faster way to build and share data apps.β44,076Updated this week
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,437Updated this week
- Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the moβ¦β22,973Jul 28, 2024Updated last year
- The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, aβ¦β25,103Updated this week
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.β41,915Updated this week
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and aβ¦β24,720Updated this week
- Papers & presentation materials from Hugging Face's internal science dayβ2,054Oct 31, 2020Updated 5 years ago
- A library for efficient similarity search and clustering of dense vectors.β39,628Updated this week
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,077Jan 23, 2026Updated 2 months ago
- Rich is a Python library for rich text and beautiful formatting in the terminal.β55,973Feb 26, 2026Updated last month
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,361Oct 27, 2025Updated 5 months ago
- Unsupervised text tokenizer for Neural Network-based text generation.β11,731Updated this week
- TensorFlow code and pre-trained models for BERTβ39,960Jul 23, 2024Updated last year
- Label Studio is a multi-type data labeling and annotation tool with standardized output formatβ26,935Updated this week
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,766Updated this week
- Train transformer language models with reinforcement learning.β17,863Mar 31, 2026Updated last week
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,502Jan 14, 2026Updated 2 months ago
- The fastai deep learning libraryβ27,953Updated this week
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Autoβ¦β17,048Updated this week
- FastAPI framework, high performance, easy to learn, fast to code, ready for productionβ96,920Updated this week
- State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterβ¦β14,768Aug 12, 2024Updated last year