π€ The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
β21,268Mar 11, 2026Updated last week
Alternatives and similar repositories for datasets
Users that are interested in datasets are comparing it to the libraries listed below
Sorting:
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β157,783Updated this week
- Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.β30,926Mar 10, 2026Updated last week
- π₯ Fast State-of-the-Art Tokenizers optimized for Research and Productionβ10,529Feb 28, 2026Updated 2 weeks ago
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,545Mar 11, 2026Updated last week
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β32,191Sep 30, 2025Updated 5 months ago
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β41,807Updated this week
- Build and share delightful machine learning apps, all in Python. π Star to support our work!β42,001Updated this week
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and moreβ35,108Updated this week
- π€ Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.β33,005Updated this week
- βοΈ Build multimodal AI applications with cloud-native stackβ21,849Mar 24, 2025Updated 11 months ago
- π« Industrial-strength Natural Language Processing (NLP) in Pythonβ33,315Mar 9, 2026Updated last week
- Tensors and Dynamic neural networks in Python with strong GPU accelerationβ98,243Updated this week
- π Papers & tech blogs by companies sharing their work on data science & machine learning in production.β28,711Jul 18, 2024Updated last year
- An open-source NLP research library, built on PyTorch.β11,893Nov 22, 2022Updated 3 years ago
- Google Researchβ37,452Updated this week
- Streamlit β A faster way to build and share data apps.β43,845Updated this week
- State-of-the-Art Text Embeddingsβ18,390Updated this week
- The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, aβ¦β24,730Updated this week
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.β41,773Updated this week
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,325Updated this week
- Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and aβ¦β24,519Updated this week
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β20,809Updated this week
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,046Jan 23, 2026Updated last month
- Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the moβ¦β22,975Jul 28, 2024Updated last year
- Rich is a Python library for rich text and beautiful formatting in the terminal.β55,777Feb 26, 2026Updated 2 weeks ago
- A library for efficient similarity search and clustering of dense vectors.β39,403Updated this week
- Label Studio is a multi-type data labeling and annotation tool with standardized output formatβ26,726Updated this week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,354Oct 27, 2025Updated 4 months ago
- FastAPI framework, high performance, easy to learn, fast to code, ready for productionβ96,291Updated this week
- The fastai deep learning libraryβ27,907Feb 26, 2026Updated 2 weeks ago
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Autoβ¦β16,918Updated this week
- TensorFlow code and pre-trained models for BERTβ39,917Jul 23, 2024Updated last year
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,429Mar 10, 2026Updated last week
- State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterβ¦β14,745Aug 12, 2024Updated last year
- Unsupervised text tokenizer for Neural Network-based text generation.β11,686Mar 1, 2026Updated 2 weeks ago
- π¬ Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, β¦β21,086Jan 29, 2026Updated last month
- The agent engineering platformβ129,503Updated this week
- A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) trainingβ23,919Aug 15, 2024Updated last year
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,735Mar 10, 2026Updated last week