This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters".
☆38Jul 31, 2024Updated last year
Alternatives and similar repositories for PETL_AST
Users that are interested in PETL_AST are comparing it to the libraries listed below
Sorting:
- ☆13Aug 23, 2024Updated last year
- ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation☆28Mar 10, 2024Updated last year
- [AAAI 2024] DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification☆12Mar 10, 2025Updated 11 months ago
- (ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation☆15Apr 29, 2025Updated 10 months ago
- (SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition☆13Oct 22, 2024Updated last year
- ☆17Jan 31, 2023Updated 3 years ago
- Polyphonic Sound Detection Score (PSDS)☆15Jan 20, 2020Updated 6 years ago
- Pytorch implementation of paper "High Fidelity Speech Regeneration With Application to Speech Enhancement"☆15May 8, 2021Updated 4 years ago
- (ICASSP 2024) Official Implementation of "Stethoscope-guided Supervised Contrastive Learning for Cross-domin Adaptation on Respiratory So…☆17Dec 5, 2024Updated last year
- ☆17Jun 11, 2025Updated 8 months ago
- ☆17Mar 1, 2024Updated 2 years ago
- Multi-modal transformer approach for natural language query based joint video summarization and highlight detection☆17May 23, 2024Updated last year
- Histogram Layer Time Delay Neural Networks For Passive Sonar Classification☆19Jan 21, 2026Updated last month
- The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"☆474Sep 18, 2025Updated 5 months ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆40Aug 29, 2024Updated last year
- Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21☆16May 14, 2022Updated 3 years ago
- ☆46Feb 16, 2023Updated 3 years ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆56Mar 27, 2025Updated 11 months ago
- The official implementation of the paper "A spatio-temporal deep learning approach for underwater acoustic signals classification". In th…☆30Apr 6, 2023Updated 2 years ago
- CTC decoder with hotwords for ASR.☆34Apr 13, 2025Updated 10 months ago
- ☆24Mar 30, 2024Updated last year
- [ACII 2023] PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Spe…☆60Jul 1, 2024Updated last year
- A dataset for Audio-Visual Sound Event Detection in Movies☆26Jan 23, 2023Updated 3 years ago
- AAAI-24 Decoupled Contrastive Learning for Long-Tailed Recognition☆32May 23, 2024Updated last year
- Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens…☆46Jun 12, 2025Updated 8 months ago
- ☆36Nov 15, 2023Updated 2 years ago
- ☆13Oct 5, 2025Updated 4 months ago
- Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification (INTERSPEECH 2023)☆72Mar 11, 2025Updated 11 months ago
- Code for the Interspeech 2024 paper "MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting"☆45Jan 24, 2026Updated last month
- ☆32Aug 10, 2022Updated 3 years ago
- An implementation of Speech Emotion Recognition, based on HuBERT model, training with PyTorch and HuggingFace framework, and fine-tuning …☆33May 18, 2022Updated 3 years ago
- Code for the paper "Jukebox: A Generative Model for Music"☆38May 1, 2021Updated 4 years ago
- Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…☆35Jun 20, 2023Updated 2 years ago
- 📦 A collection of pastable code gathered from past projects☆12Sep 9, 2024Updated last year
- Build an AI bot in Discord to serve user's personalized reports on what's up in tech☆28Sep 14, 2025Updated 5 months ago
- Dataset created for the Power Line Insulators Inspection Detections☆10Jul 2, 2020Updated 5 years ago
- Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".☆469Apr 24, 2024Updated last year
- Unsupervised Rhythm Modeling for Voice Conversion☆86Aug 3, 2023Updated 2 years ago
- ☆38Jan 17, 2025Updated last year