nazmulkazi / dataset_automated_medical_transcription
Dataset for training machine learning model for automatically generating psychiatric case notes from doctor-patient conversations.
☆50Updated last year
Related projects ⓘ
Alternatives and complementary repositories for dataset_automated_medical_transcription
- Dataset for Natural Language Processing using a corpus of medical transcriptions and custom-generated clinical stop words and vocabulary.☆84Updated 4 years ago
- Sentence tokenizer for clinical/medical text.☆25Updated 5 months ago
- A deidentifier / deidentification pipeline developed by Stanford and Penn as part of the MIDRC organization.☆78Updated 5 months ago
- ☆22Updated last year
- ☆49Updated last year
- Zero-shot Audio Classification using Whisper☆74Updated last year
- Clinical text summarization by adapting large language models☆120Updated 3 months ago
- Robust de-identification of medical notes using transformer architectures☆45Updated 2 years ago
- Supplementary material for "Understanding Parameter-Efficient Finetuning of Large Language Models: From Prefix Tuning to Adapters"☆43Updated last year
- Stable timestamps and confidence score for words of OpenAI's Whisper outputs down to word-level.☆25Updated last year
- This project develops compact transformer models tailored for clinical text analysis, balancing efficiency and performance for healthcare…☆18Updated 7 months ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆72Updated last year
- 🎯 Speech Recognition Challenge by Speech Lab - IIT Madras☆11Updated 4 years ago
- A Python Natural Language Processing Toolkit for Medical Text Generation☆70Updated 3 weeks ago
- Using short models to classify long texts☆20Updated last year
- A TextTiling-based algorithm for text segmentation (aka topic segmentation) that uses neural sentence encoders, as well as extractive sum…☆42Updated last year
- A library for squeakily cleaning and filtering language datasets.☆45Updated last year
- MedAlign is a clinician-generated dataset for instruction following with electronic medical records.☆89Updated last year
- GPTNERMED is a language model-generated, synthetic dataset and an open neural NER model for medical entities designed for German data.☆15Updated last year
- Audio Preprocessing and finetuning of wav2vec2-large-xlsr model on AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF Data.☆17Updated 3 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆62Updated 8 months ago
- Self-verification for LLMs.☆62Updated last year
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated 2 years ago
- Speaker diarization service☆19Updated this week
- ☆82Updated 3 months ago
- A Streamlit application to visualize sentence embeddings☆20Updated last year
- A corpus of textual data corresponding to synthetic clinical encounters, including each encounters’ dialogue transcript and clinical note…☆29Updated last year
- Biomedical Data-to-Text Generation via Fine-Tuning Transformers☆29Updated 2 years ago
- ☆64Updated last year
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆134Updated 10 months ago