π€ The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
β21,648Jun 18, 2026Updated last week
Alternatives and similar repositories for datasets
Users that are interested in datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β161,885Updated this week
- π₯ Fast State-of-the-Art Tokenizers optimized for Research and Productionβ10,834Jun 19, 2026Updated last week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,737Updated this week
- Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.β31,209Jun 10, 2026Updated 2 weeks ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β32,233Sep 30, 2025Updated 8 months ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- π€ Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.β33,914Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β42,586Updated this week
- Build and share delightful machine learning apps, all in Python. π Star to support our work!β42,989Updated this week
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and moreβ35,879Updated this week
- π« Industrial-strength Natural Language Processing (NLP) in Pythonβ33,697May 19, 2026Updated last month
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,426Updated this week
- Tensors and Dynamic neural networks in Python with strong GPU accelerationβ100,915Jun 21, 2026Updated last week
- An open-source NLP research library, built on PyTorch.β11,889Nov 22, 2022Updated 3 years ago
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β21,299Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- State-of-the-Art Embeddings, Retrieval, and Rerankingβ18,853Updated this week
- π Papers & tech blogs by companies sharing their work on data science & machine learning in production.β29,835Jul 18, 2024Updated last year
- Google Researchβ38,238Updated this week
- βοΈ Build multimodal AI applications with cloud-native stackβ21,857Mar 24, 2025Updated last year
- Streamlit β A faster way to build and share data apps.β45,050Updated this week
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,456May 26, 2026Updated last month
- Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the moβ¦β22,955Jul 28, 2024Updated last year
- The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, aβ¦β26,741Updated this week
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.β43,025Updated this week
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and aβ¦β25,622Jun 19, 2026Updated last week
- A library for efficient similarity search and clustering of dense vectors.β40,378Jun 21, 2026Updated last week
- Papers & presentation materials from Hugging Face's internal science dayβ2,053Oct 31, 2020Updated 5 years ago
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,151Jan 23, 2026Updated 5 months ago
- Rich is a Python library for rich text and beautiful formatting in the terminal.β56,658Jun 15, 2026Updated last week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,381Oct 27, 2025Updated 8 months ago
- Unsupervised text tokenizer for Neural Network-based text generation.β11,925Updated this week
- TensorFlow code and pre-trained models for BERTβ40,034Jul 23, 2024Updated last year
- Label Studio is a multi-type data labeling and annotation tool with standardized output formatβ27,663Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,527Jan 14, 2026Updated 5 months ago
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,815Updated this week
- Train transformer language models with reinforcement learning.β18,701Updated this week
- The fastai deep learning libraryβ28,047Updated this week
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Autoβ¦β17,456Updated this week
- FastAPI framework, high performance, easy to learn, fast to code, ready for productionβ99,556Jun 21, 2026Updated last week
- State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterβ¦β14,820Aug 12, 2024Updated last year