qanastek / DrBERT
DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains
β19Updated last year
Alternatives and similar repositories for DrBERT:
Users that are interested in DrBERT are comparing it to the libraries listed below
- A spaCy custom component that extracts and normalizes temporal expressionsβ54Updated 2 years ago
- Using short models to classify long textsβ21Updated 2 years ago
- π€ Disaggregators: Curated data labelers for in-depth analysis.β65Updated 2 years ago
- β22Updated 2 months ago
- Are foundation LMs multilingual knowledge bases? (EMNLP 2023)β19Updated last year
- β51Updated 3 years ago
- Do Multilingual Language Models Think Better in English?β41Updated last year
- A High-level Library for Named Entity Recognition in Python.β23Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.β93Updated 2 years ago
- Ranking of fine-tuned HF models as base models.β35Updated last year
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.β108Updated 10 months ago
- This project develops compact transformer models tailored for clinical text analysis, balancing efficiency and performance for healthcareβ¦β18Updated last year
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to iβ¦β46Updated 11 months ago
- Embedding Recycling for Language modelsβ38Updated last year
- A Python library aimed at dissecting and augmenting NER training data.β58Updated last year
- XAI based human-in-the-loop framework for automatic rule-learning.β48Updated 9 months ago
- Fact checking baseline combining dense retrieval and textual entailmentβ28Updated 2 months ago
- Multidocument Summarization for Literature Review Shared Task 2022β29Updated 2 years ago
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognitionβ15Updated last year
- β87Updated 3 months ago
- Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to Eβ¦β21Updated 2 years ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' puβ¦β40Updated 3 years ago
- German Alpaca Dataset (Cleaned + Translated)β24Updated 2 years ago
- allennlp-light is a port of AllenNLP's core modules and nn portions into a standalone package with minimum dependenciesβ56Updated 2 years ago
- German dataset for DPR model trainingβ18Updated 8 months ago
- SQuARE: Software for question answering research.β75Updated 9 months ago
- codebase release for EMNLP2023 paper publicationβ19Updated last year
- EDS-Pseudo is a hybrid model for detecting personally identifying entities in clinical reportsβ52Updated last week
- Semantically Structured Sentence Embeddingsβ65Updated 5 months ago
- The CleanCoNLL dataset from our EMNLP 2023 paper where we corrected annotation errors and inconsistencies in CoNLL-03.β23Updated 9 months ago