huggingface / datasetsLinks
π€ The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
β20,306Updated this week
Alternatives and similar repositories for datasets
Users that are interested in datasets are comparing it to the libraries listed below
Sorting:
- Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.β29,665Updated this week
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β146,154Updated this week
- π₯ Fast State-of-the-Art Tokenizers optimized for Research and Productionβ9,836Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β8,875Updated this week
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β31,582Updated 2 weeks ago
- Unsupervised text tokenizer for Neural Network-based text generation.β11,022Updated 2 months ago
- This repository contains demos I made with the Transformers library by HuggingFace.β11,016Updated last month
- State-of-the-Art Text Embeddingsβ17,024Updated 2 weeks ago
- Ongoing research training transformer models at scaleβ12,701Updated this week
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and moreβ32,619Updated this week
- Visualizer for neural network, deep learning and machine learning modelsβ30,536Updated this week
- State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterβ¦β14,349Updated 10 months ago
- Label Studio is a multi-type data labeling and annotation tool with standardized output formatβ22,716Updated this week
- An open-source NLP research library, built on PyTorch.β11,852Updated 2 years ago
- Development repository for the Triton language and compilerβ15,989Updated this week
- BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)β7,491Updated 3 weeks ago
- Open source annotation tool for machine learning practitioners.β10,091Updated 2 weeks ago
- Open Source Neural Machine Translation and (Large) Language Models in PyTorchβ6,896Updated 3 months ago
- The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!β7,814Updated this week
- Fast and memory-efficient exact attentionβ18,043Updated this week
- Open source platform for the machine learning lifecycleβ21,068Updated this week
- Train transformer language models with reinforcement learning.β14,366Updated this week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,208Updated last week
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β18,861Updated this week
- Hydra is a framework for elegantly configuring complex applicationsβ9,441Updated last month
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.β37,737Updated this week
- Transformer related optimization, including BERT, GPTβ6,219Updated last year
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Autoβ¦β14,924Updated this week
- Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Streβ¦β8,686Updated 2 weeks ago
- A framework for training and evaluating AI models on a variety of openly available dialogue datasets.β10,586Updated last year