qanastek / DrBERT
DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains
β16Updated 7 months ago
Related projects: β
- π€ Disaggregators: Curated data labelers for in-depth analysis.β66Updated last year
- β56Updated 7 months ago
- Using short models to classify long textsβ20Updated last year
- Do Multilingual Language Models Think Better in English?β41Updated last year
- β22Updated last year
- Preprocessing and analysis for training SNOMED-CT concept embeddings from CORD-19 corpusβ14Updated last year
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learningβ29Updated last year
- Ranking of fine-tuned HF models as base models.β35Updated last year
- codebase release for EMNLP2023 paper publicationβ19Updated 6 months ago
- Tutorial to pretrain & fine-tune a π€ Flax T5 model on a TPUv3-8 with GCPβ58Updated 2 years ago
- A HuggingFace compatible xLSTM trainer.β57Updated last week
- A Python library aimed at dissecting and augmenting NER training data.β56Updated last year
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to iβ¦β46Updated 5 months ago
- This project develops compact transformer models tailored for clinical text analysis, balancing efficiency and performance for healthcareβ¦β18Updated 5 months ago
- EDS-PDF is a generic, pure-Python framework for text extraction from PDF documents. It provides the machinery to use rule- or machine-leaβ¦β38Updated last month
- π« SpaCy wrapper for ConceptNet π«β88Updated last year
- Efficiently find the best-suited language model (LM) for your NLP taskβ12Updated this week
- A spaCy custom component that extracts and normalizes temporal expressionsβ53Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β31Updated 2 weeks ago
- β13Updated last year
- Experiments for XLM-V Transformers Integerationβ13Updated last year
- triple-encoders is a library for contextualizing distributed Sentence Transformers representations.β12Updated 2 weeks ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.β73Updated last week
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.β51Updated last month
- Plug-and-play Search Interfaces with Pyserini and Hugging Faceβ32Updated last year
- Scripts to convert datasets from various sources to Hugging Face Datasets.β57Updated last year
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progrβ¦β21Updated last month
- GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddingsβ29Updated 6 months ago
- Code for equipping pretrained language models (BART, GPT-2, XLNet) with commonsense knowledge for generating implicit knowledge statementβ¦β16Updated 3 years ago
- Annotated corpus + evaluation metrics for text anonymisationβ48Updated 7 months ago