huggingface / datasets
π€ The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
β19,969Updated last week
Alternatives and similar repositories for datasets:
Users that are interested in datasets are comparing it to the libraries listed below
- π₯ Fast State-of-the-Art Tokenizers optimized for Research and Productionβ9,592Updated last month
- π€ Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.β142,871Updated this week
- Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.β29,300Updated this week
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β31,297Updated 3 months ago
- Code for the paper "Language Models are Unsupervised Multitask Learners"β23,310Updated 8 months ago
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β8,608Updated last week
- Streamlit β A faster way to build and share data apps.β38,776Updated this week
- An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.β8,288Updated 3 years ago
- Build and share delightful machine learning apps, all in Python. π Star to support our work!β37,426Updated this week
- Ongoing research training transformer models at scaleβ12,075Updated this week
- State-of-the-Art Text Embeddingsβ16,444Updated this week
- Trax β Deep Learning with Clear Code and Speedβ8,191Updated last week
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ21,069Updated last month
- State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterβ¦β14,158Updated 8 months ago
- Repo for external large-scale workβ6,522Updated 11 months ago
- Text preprocessing, representation and visualization from zero to hero.β2,904Updated last year
- GPT-3: Language Models are Few-Shot Learnersβ15,759Updated 4 years ago
- An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed librariesβ7,157Updated this week
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,329Updated last month
- Notebooks using the Hugging Face libraries π€β4,018Updated this week
- CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an imageβ28,419Updated 8 months ago
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,438Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β37,916Updated this week
- An open-source NLP research library, built on PyTorch.β11,837Updated 2 years ago
- π« Industrial-strength Natural Language Processing (NLP) in Pythonβ31,374Updated this week
- The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic β¦β3,536Updated 2 weeks ago
- BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)β7,306Updated last year
- Unsupervised text tokenizer for Neural Network-based text generation.β10,786Updated 2 weeks ago
- Natural Language Processing Best Practices & Examplesβ6,407Updated 2 years ago
- TensorFlow code and pre-trained models for BERTβ39,027Updated 8 months ago