π€ The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
β21,228Updated this week
Alternatives and similar repositories for datasets
Users that are interested in datasets are comparing it to the libraries listed below
Sorting:
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β157,071Updated this week
- Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.β30,860Feb 21, 2026Updated last week
- π₯ Fast State-of-the-Art Tokenizers optimized for Research and Productionβ10,485Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,513Updated this week
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β32,170Sep 30, 2025Updated 5 months ago
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β41,648Updated this week
- Build and share delightful machine learning apps, all in Python. π Star to support our work!β41,855Updated this week
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and moreβ34,940Updated this week
- π€ Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.β32,873Updated this week
- βοΈ Build multimodal AI applications with cloud-native stackβ21,832Mar 24, 2025Updated 11 months ago
- π« Industrial-strength Natural Language Processing (NLP) in Pythonβ33,254Nov 27, 2025Updated 3 months ago
- Tensors and Dynamic neural networks in Python with strong GPU accelerationβ97,688Updated this week
- π Papers & tech blogs by companies sharing their work on data science & machine learning in production.β28,698Jul 18, 2024Updated last year
- An open-source NLP research library, built on PyTorch.β11,889Nov 22, 2022Updated 3 years ago
- Google Researchβ37,367Updated this week
- Streamlit β A faster way to build and share data apps.β43,634Updated this week
- State-of-the-Art Text Embeddingsβ18,298Feb 20, 2026Updated last week
- The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, β¦β24,365Updated this week
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.β41,516Updated this week
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,296Feb 9, 2026Updated 2 weeks ago
- Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and aβ¦β24,295Updated this week
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β20,678Updated this week
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,033Jan 23, 2026Updated last month
- Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the moβ¦β22,981Jul 28, 2024Updated last year
- Rich is a Python library for rich text and beautiful formatting in the terminal.β55,569Feb 19, 2026Updated last week
- A library for efficient similarity search and clustering of dense vectors.β39,195Updated this week
- Label Studio is a multi-type data labeling and annotation tool with standardized output formatβ26,505Updated this week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,359Oct 27, 2025Updated 4 months ago
- FastAPI framework, high performance, easy to learn, fast to code, ready for productionβ95,554Updated this week
- The fastai deep learning libraryβ27,864Feb 14, 2026Updated 2 weeks ago
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Autoβ¦β16,807Updated this week
- TensorFlow code and pre-trained models for BERTβ39,875Jul 23, 2024Updated last year
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,419Jan 20, 2026Updated last month
- State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterβ¦β14,735Aug 12, 2024Updated last year
- Unsupervised text tokenizer for Neural Network-based text generation.β11,668Updated this week
- π¬ Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, β¦β21,070Jan 29, 2026Updated last month
- π¦π The platform for reliable agents.β127,192Updated this week
- A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) trainingβ23,667Aug 15, 2024Updated last year
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,729Updated this week