the-anonymous-bs / av-SALMONNView external linksLinks
av-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
☆13May 8, 2024Updated last year
Alternatives and similar repositories for av-SALMONN
Users that are interested in av-SALMONN are comparing it to the libraries listed below
Sorting:
- ☆24Sep 20, 2024Updated last year
- ☆86Jul 31, 2025Updated 6 months ago
- The code and data for "Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization"☆11May 16, 2023Updated 2 years ago
- The implementation codes of paper: Multimodal Sentiment Analysis with Mutual Information-based Disentangled Representation Learning☆18May 8, 2025Updated 9 months ago
- Group project for 11-785 Fall 2022 @ CMU☆10Dec 15, 2022Updated 3 years ago
- ☆10Jan 18, 2024Updated 2 years ago
- Code for paper "Cross-Domain Slot Filling as Machine Reading Comprehension" in IJCAI 2021☆11Aug 24, 2021Updated 4 years ago
- We present a study of a neural network based method for speech emotion recognition, using audio-only features. In the studied scheme, the…☆11Jul 24, 2024Updated last year
- ☆14Feb 22, 2025Updated 11 months ago
- The implementations of some works from Davar-Lab. Currently we have the code of Text Perceptron (AAAI 2020). Some works' code will be pub…☆11Mar 26, 2021Updated 4 years ago
- Multimodal SER Model meant to be trained on recognising emotions from speech (text + acoustic data). Fine-tuned the DeBERTaV3 model, resp…☆11Jun 19, 2024Updated last year
- (ECCV2022) EAGAN: EAGAN: Efficient Two-stage Evolutionary Architecture Search for GANs☆12Sep 15, 2022Updated 3 years ago
- Substitute alternative spellings of special characters (e.g. German umlauts [ae, oe, ue] and [ss]) with their correct versions (ä, ö, ü, …☆11Nov 24, 2024Updated last year
- ☆10Oct 16, 2025Updated 4 months ago
- ☆11Nov 11, 2022Updated 3 years ago
- Awesome Multimodal Fusion in Speech Emotion Recognition☆13Nov 11, 2025Updated 3 months ago
- ☆10Jul 16, 2024Updated last year
- [ECCVW/TWYN 2024 - Best Workshop Paper] Are CLIP features all you need for Universal Synthetic Image Origin Attribution?☆12Feb 1, 2025Updated last year
- This repository contains the official implementation (PyTorch) of "Multimodal Forgery Detection Using Ensemble Learning" proposed in APSI…☆10Jan 4, 2023Updated 3 years ago
- A lecture summarization tool that uses AI and computer vision to summarize and index videos☆11Dec 8, 2022Updated 3 years ago
- Implementation of paper Deep Back Projection Network paper☆10Jul 23, 2018Updated 7 years ago
- Datasets of audio adversarial examples for deep speech recognition systems and Python code of a detection system☆12May 6, 2023Updated 2 years ago
- ☆13Oct 17, 2020Updated 5 years ago
- ☆11Oct 24, 2022Updated 3 years ago
- [CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'☆13Jun 16, 2024Updated last year
- [TPAMI 2024] The official implementation of "Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clu…☆11Mar 19, 2024Updated last year
- ☆10Dec 22, 2023Updated 2 years ago
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆59Apr 3, 2025Updated 10 months ago
- Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)☆12Jun 1, 2023Updated 2 years ago
- baseline method for CROCS 2024☆10Jan 24, 2024Updated 2 years ago
- NAR-BERT-ASR☆10Sep 27, 2021Updated 4 years ago
- Code recipe for "Multimodal One-Shot Learning of Speech and Images"☆11Nov 21, 2018Updated 7 years ago
- Source code to "SliTraNet: Automatic Detection of Slide Transitions in Lecture Videos using Convolutional Neural Networks"☆10Dec 17, 2023Updated 2 years ago
- [ICCV 2023] Accurate and Fast Compressed Video Captioning☆52Jul 28, 2025Updated 6 months ago
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆11Mar 14, 2025Updated 11 months ago
- ☆23Dec 6, 2025Updated 2 months ago
- Official Pytorch Implementation for the paper 'SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients'☆17Jan 12, 2022Updated 4 years ago
- A handwritten Chemical Structure Image data set named EDU-CHEMC, which consists of totally 52,987 handwritten molecular structure images …☆14May 12, 2025Updated 9 months ago
- The code for Multi-Scale Receptive Field Graph Model for Emotion Recognition in Conversations☆11Jan 17, 2023Updated 3 years ago