huggingface / datasetsLinks
π€ The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
β20,389Updated this week
Alternatives and similar repositories for datasets
Users that are interested in datasets are comparing it to the libraries listed below
Sorting:
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β147,239Updated this week
- Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.β29,799Updated this week
- π₯ Fast State-of-the-Art Tokenizers optimized for Research and Productionβ9,904Updated this week
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and moreβ32,853Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β8,951Updated this week
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β31,629Updated last month
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Autoβ¦β15,133Updated this week
- A framework for training and evaluating AI models on a variety of openly available dialogue datasets.β10,599Updated last year
- Flax is a neural network library for JAX that is designed for flexibility.β6,684Updated this week
- π€ Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.β29,863Updated this week
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β19,080Updated last week
- State-of-the-Art Text Embeddingsβ17,150Updated this week
- Google Researchβ36,013Updated last week
- Hydra is a framework for elegantly configuring complex applicationsβ9,511Updated this week
- Graph Neural Network Library for PyTorchβ22,621Updated this week
- Trax β Deep Learning with Clear Code and Speedβ8,235Updated 3 months ago
- AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file convertβ¦β21,569Updated this week
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,526Updated this week
- BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)β7,549Updated last month
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β39,392Updated this week
- A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)β5,581Updated 2 months ago
- A data augmentations library for audio, image, text, and video.β5,023Updated this week
- An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model cβ¦β14,229Updated last year
- Unsupervised text tokenizer for Neural Network-based text generation.β11,087Updated last week
- Development repository for the Triton language and compilerβ16,198Updated this week
- Hackable and optimized Transformers building blocks, supporting a composable construction.β9,751Updated this week
- Open standard for machine learning interoperabilityβ19,264Updated this week
- An open-source NLP research library, built on PyTorch.β11,858Updated 2 years ago
- Fast and memory-efficient exact attentionβ18,448Updated last week
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ21,552Updated 2 weeks ago