getalp/Flaubert

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/getalp/Flaubert)

getalp / Flaubert

Unsupervised Language Model Pre-training for French

☆246

Alternatives and similar repositories for Flaubert

Users that are interested in Flaubert are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TheophileBlard / french-sentiment-analysis-with-bert
View on GitHub
How good is BERT ? Comparing BERT to other state-of-the-art approaches on a French sentiment analysis dataset
☆157Feb 16, 2023Updated 3 years ago
UniversalDependencies / UD_French-Sequoia
View on GitHub
Data from the Sequoia treebank.
☆11May 6, 2026Updated 2 months ago
getalp / UFSAC
View on GitHub
UFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them
☆39May 17, 2022Updated 4 years ago
opinionscience / InstructionFr
View on GitHub
A repository of instructions in French to fine-tune LLMs
☆16Jun 23, 2023Updated 3 years ago
nyu-dl / AMMI-2019-NLP-Part2
View on GitHub
☆16Dec 8, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Cour-de-cassation / moteurNER
View on GitHub
communication sur le moteur de pseudonymisation de la Cour de Cassation
☆19Feb 14, 2023Updated 3 years ago
aquadzn / gpt2-french
View on GitHub
GPT-2 French demo | Démo française de GPT-2
☆68Jan 13, 2021Updated 5 years ago
moussaKam / BARThez
View on GitHub
A french sequence to sequence pretrained model
☆63Aug 27, 2022Updated 3 years ago
gamebusterz / French-Sentiment-Analysis-Dataset
View on GitHub
A collection of over 1.5 Million tweets data translated to French, with their sentiment.
☆35May 18, 2017Updated 9 years ago
getalp / disambiguate
View on GitHub
Disambiguate is a tool for training and using state of the art neural WSD models
☆60Jul 12, 2025Updated 11 months ago
boudinfl / kea
View on GitHub
A tokenizer for French
☆14Apr 18, 2013Updated 13 years ago
curto2 / mckernel
View on GitHub
McKernel: A Library for Approximate Kernel Expansions in Log-linear Time.
☆13Sep 3, 2022Updated 3 years ago
bnosac / tokenizers.bpe
View on GitHub
R package for Byte Pair Encoding based on YouTokenToMe
☆17Jun 13, 2026Updated 3 weeks ago
coteries / cedille-ai
View on GitHub
✒️ Cedille is a large French language model (6B), released under an open-source license
☆202Feb 9, 2022Updated 4 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
etalab / calculette-impots-m-language-parser
View on GitHub
Calculette de l'impôt sur le revenu parsée
☆15Feb 19, 2020Updated 6 years ago
SCCH-KVS / training-engine
View on GitHub
The stand-alone training engine module for the ALOHA.eu project.
☆15Oct 27, 2019Updated 6 years ago
unimorph / wiktionary-tools
View on GitHub
Tools for scraping, annotating, and parsing morphological information from Wiktionary
☆15Oct 19, 2019Updated 6 years ago
pommedeterresautee / projector
View on GitHub
Project Dense Vectors Text Representation on 2D Plan
☆16Mar 7, 2019Updated 7 years ago
SapienzaNLP / clubert
View on GitHub
Distribution of word meanings in Wikipedia for English, Italian, French, German and Spanish.
☆10Jan 4, 2021Updated 5 years ago
SapienzaNLP / mcl-wic
View on GitHub
Semeval-2021 Multilingual and Cross-lingual Word-in-Context Task
☆18May 27, 2021Updated 5 years ago
naverlabseurope / ALPS2024-MT-LAB
View on GitHub
CD20200004 from 01/01/2021 to 31/12/2023 - LIG UGA - Python Notebook and Models for the MT Lab @ ALPS 2022
☆13Apr 1, 2024Updated 2 years ago
getalp / wikIR
View on GitHub
A python tool for building large scale Wikipedia-based Information Retrieval datasets
☆47Apr 28, 2021Updated 5 years ago
sereprz / ShakespeareTextAnalysis
View on GitHub
☆20May 29, 2016Updated 10 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
mattia-decao / hiero-transformer
View on GitHub
☆15Nov 3, 2024Updated last year
boberle / cofr
View on GitHub
coFR: COreference resolution tool for FRench (and singletons).
☆27Jun 7, 2020Updated 6 years ago
LeBenchmark / Interspeech2021
View on GitHub
This repository describes our reproducible framework for assessing self-supervised representation learning from speech
☆52Oct 8, 2021Updated 4 years ago
edwardjhu / improved_wasserstein
View on GitHub
Code for our ICLR Trustworthy ML 2020 workshop paper "Improved Image Wasserstein Attacks and Defenses"
☆14Apr 28, 2020Updated 6 years ago
kasparvonbeelen / ghi_python
View on GitHub
Programming for Historians
☆17Sep 12, 2022Updated 3 years ago
pixano / pixano_legacy.github.io
View on GitHub
Pixano website
☆10Apr 7, 2022Updated 4 years ago
clulab / odin-examples
View on GitHub
Small examples showing how to use Odin for various IE tasks
☆16Jun 1, 2017Updated 9 years ago
BayesForDays / nontology
View on GitHub
Matrix tools for building and inspecting latent spaces
☆26Aug 19, 2018Updated 7 years ago
GazePlay / GazePlay
View on GitHub
Gazeplay is a free and open-source software which gathers several mini-games playable with an eye-tracker. Last version includes almost 6…
☆50Jul 2, 2026Updated last week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
oeuvres / alix
View on GitHub
A Lucene Indexer for XML, with lexical analysis (lemmatization for French)
☆18Updated this week
globalwordnet / schemas
View on GitHub
WordNet-LMF formats
☆28Updated this week
bloomberg / koan
View on GitHub
A word2vec negative sampling implementation with correct CBOW update.
☆261Nov 8, 2021Updated 4 years ago
CentreForDigitalHumanities / tscan
View on GitHub
T-scan: an analysis tool for dutch texts to assess the complexity of the text, based on original work by Rogier Kraf
☆19May 28, 2025Updated last year
facebookresearch / muss
View on GitHub
Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".
☆98Feb 2, 2023Updated 3 years ago
ELS-RD / anonymisation
View on GitHub
Anonymization of legal cases (Fr) based on Flair embeddings
☆89Dec 9, 2020Updated 5 years ago
philschulz / stochastic-decoder
View on GitHub
Code and workflow for the reproduction of the stochastic decoder experiments.
☆15May 25, 2018Updated 8 years ago