AntoineSimoulin / gpt-fr
Generative Pretrained Transformers for French
☆27Updated 2 years ago
Related projects: ⓘ
- A french sequence to sequence pretrained model☆57Updated 2 years ago
- The French summarization dataset introduced in "BARThez: a Skilled Pretrained French Sequence-to-Sequence Model".☆22Updated 3 years ago
- German small and large versions of GPT2.☆19Updated 2 years ago
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆85Updated 2 months ago
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated last year
- ☆16Updated last year
- An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets☆31Updated 7 months ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆91Updated last year
- Inference code in Pytorch for GPT-like models, such as PAGnol, a family of models with up to 1.5B parameters, trained on datasets in Fren…☆20Updated last year
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆73Updated last week
- A Python library aimed at dissecting and augmenting NER training data.☆56Updated last year
- MAFAND-MT☆52Updated 2 months ago
- A spaCy custom component that extracts and normalizes temporal expressions☆53Updated last year
- RaKUn 2.0 - A fast keyword detection algorithm☆61Updated last month
- Reduce the size of pretrained Hugging Face models via vocabulary trimming.☆39Updated last year
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆27Updated last year
- NTREX -- News Test References for MT Evaluation☆73Updated 3 months ago
- Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".☆96Updated last year
- Experiments with generating opensource language model assistants☆97Updated last year
- Tools for managing datasets for governance and training.☆77Updated last month
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆51Updated 3 months ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆66Updated last year
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 2 years ago
- A tiny BERT for low-resource monolingual models☆28Updated 4 months ago
- 💫 SpaCy wrapper for ConceptNet 💫☆88Updated last year
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Do…☆72Updated 2 months ago
- A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+☆37Updated 3 years ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆64Updated 2 years ago
- A library to synthesize text datasets using Large Language Models (LLM)☆151Updated last year
- negate_sentence(A Python module that doesn't negate sentences.)☆27Updated 4 months ago