qanastek / DrBERT
DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains
β19Updated last year
Alternatives and similar repositories for DrBERT:
Users that are interested in DrBERT are comparing it to the libraries listed below
- π€ Disaggregators: Curated data labelers for in-depth analysis.β65Updated 2 years ago
- This project develops compact transformer models tailored for clinical text analysis, balancing efficiency and performance for healthcareβ¦β18Updated 11 months ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.β93Updated 2 years ago
- A spaCy custom component that extracts and normalizes temporal expressionsβ54Updated 2 years ago
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"β29Updated 4 months ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.β77Updated 5 months ago
- β83Updated 2 months ago
- π€ Push your spaCy pipelines to the Hugging Face Hubβ43Updated 9 months ago
- Tutorial to pretrain & fine-tune a π€ Flax T5 model on a TPUv3-8 with GCPβ58Updated 2 years ago
- β21Updated last month
- Using short models to classify long textsβ21Updated last year
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2β¦β67Updated 2 years ago
- Tools for managing datasets for governance and training.β82Updated last month
- Code for equipping pretrained language models (BART, GPT-2, XLNet) with commonsense knowledge for generating implicit knowledge statementβ¦β16Updated 3 years ago
- Fact checking baseline combining dense retrieval and textual entailmentβ28Updated last month
- A Python library aimed at dissecting and augmenting NER training data.β58Updated last year
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023β118Updated 3 months ago
- β15Updated last year
- Multidocument Summarization for Literature Review Shared Task 2022β29Updated 2 years ago
- Do Multilingual Language Models Think Better in English?β41Updated last year
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progrβ¦β28Updated 2 months ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.β56Updated 7 months ago
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learningβ29Updated 2 years ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub π€β‘οΈβ36Updated 2 years ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to iβ¦β46Updated 10 months ago
- A multi-purpose toolkit for table-to-text generation: web interface, Python bindings, CLI commands.β55Updated 10 months ago
- β51Updated last year
- β24Updated 2 months ago
- codebase release for EMNLP2023 paper publicationβ19Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laβ¦β46Updated last year