huggingface / datasetsLinks
π€ The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
β20,801Updated last week
Alternatives and similar repositories for datasets
Users that are interested in datasets are comparing it to the libraries listed below
Sorting:
- π₯ Fast State-of-the-Art Tokenizers optimized for Research and Productionβ10,206Updated 3 weeks ago
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,256Updated 2 weeks ago
- Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.β30,382Updated this week
- Unsupervised text tokenizer for Neural Network-based text generation.β11,411Updated this week
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β151,997Updated this week
- An open-source NLP research library, built on PyTorch.β11,882Updated 2 years ago
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Autoβ¦β16,017Updated last week
- Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data β¦β11,062Updated 3 weeks ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β31,918Updated last month
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,316Updated last week
- Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Streβ¦β8,878Updated 2 weeks ago
- Ongoing research training transformer models at scaleβ14,082Updated this week
- Code for the paper "Language Models are Unsupervised Multitask Learners"β24,338Updated last year
- Build and share delightful machine learning apps, all in Python. π Star to support our work!β40,389Updated this week
- π« Industrial-strength Natural Language Processing (NLP) in Pythonβ32,758Updated last week
- Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conveβ¦β4,222Updated 2 months ago
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and moreβ33,852Updated last week
- BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)β7,731Updated 5 months ago
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,446Updated 6 months ago
- Streamlit β A faster way to build and share data apps.β42,035Updated this week
- π Papers & tech blogs by companies sharing their work on data science & machine learning in production.β28,482Updated last year
- The official Python client for the Hugging Face Hub.β3,030Updated this week
- Google Researchβ36,659Updated last week
- Data augmentation for NLPβ4,627Updated last year
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,642Updated this week
- tiktoken is a fast BPE tokeniser for use with OpenAI's models.β16,404Updated last month
- State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterβ¦β14,542Updated last year
- A library for efficient similarity search and clustering of dense vectors.β37,735Updated last week
- The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!β8,172Updated last week
- π€ Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.β31,453Updated this week