huggingface / tokenizers
π₯ Fast State-of-the-Art Tokenizers optimized for Research and Production
β9,052Updated this week
Related projects β
Alternatives and complementary repositories for tokenizers
- Unsupervised text tokenizer for Neural Network-based text generation.β10,295Updated 2 weeks ago
- Ongoing research training transformer models at scaleβ10,595Updated this week
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,178Updated 2 months ago
- An open-source NLP research library, built on PyTorch.β11,759Updated last year
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β30,552Updated last month
- State-of-the-Art Text Embeddingsβ15,368Updated this week
- A library for efficient similarity search and clustering of dense vectors.β31,488Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β7,958Updated this week
- Trax β Deep Learning with Clear Code and Speedβ8,101Updated 2 months ago
- Open Source Neural Machine Translation and (Large) Language Models in PyTorchβ6,773Updated 4 months ago
- The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic β¦β3,492Updated 2 weeks ago
- Train transformer language models with reinforcement learning.β10,086Updated this week
- BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)β6,952Updated last year
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Autoβ¦β12,150Updated this week
- XLNet: Generalized Autoregressive Pretraining for Language Understandingβ6,182Updated last year
- PyTorch original implementation of Cross-lingual Language Model Pretraining.β2,892Updated last year
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β13,946Updated this week
- Language-Agnostic SEntence Representationsβ3,600Updated 6 months ago
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β16,471Updated this week
- Papers & presentation materials from Hugging Face's internal science dayβ2,037Updated 4 years ago
- π€ Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.β135,166Updated this week
- ONNX Runtime: cross-platform, high performance ML inferencing and training acceleratorβ14,722Updated this week
- A natural language modeling framework based on PyTorchβ6,338Updated 2 years ago
- Fast and memory-efficient exact attentionβ14,279Updated this week
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representationsβ3,247Updated last year
- Development repository for the Triton language and compilerβ13,443Updated this week
- Open source annotation tool for machine learning practitioners.β9,586Updated this week
- The implementation of DeBERTaβ1,991Updated last year
- A system for quickly generating training data with weak supervisionβ5,812Updated 6 months ago
- A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) trainingβ20,199Updated 3 months ago