Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
☆25Oct 1, 2024Updated last year
Alternatives and similar repositories for action2sound
Users that are interested in action2sound are comparing it to the libraries listed below
Sorting:
- Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence☆19Jun 14, 2024Updated last year
- Official implementation of "Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound". IEEE TASLP 20…☆17Feb 27, 2026Updated last week
- This is the code and dataset repo for Interspeech 2024 paper "Target conversation extraction: Source separation using turn-taking dynamic…☆55Aug 15, 2025Updated 6 months ago
- Pytorch implementation of our paper: Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training.☆18Jul 11, 2022Updated 3 years ago
- A python implementation of “Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization” [TASLP 2021]☆27Feb 11, 2023Updated 3 years ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆58Apr 17, 2024Updated last year
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆34Apr 1, 2025Updated 11 months ago
- Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation☆32Mar 8, 2024Updated last year
- ☆29Feb 8, 2026Updated 3 weeks ago
- Cog wrapper for microsoft/OmniParser-v2☆12Feb 25, 2025Updated last year
- Python library to forecast univariate time series through backtesting model selection☆23Jun 12, 2024Updated last year
- This is the microphone array generalization investigation based on previous Narrow Band Deep Filtering methods.☆38Mar 12, 2024Updated last year
- BigSmall: Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological Measurements☆41Feb 2, 2026Updated last month
- ☆37May 8, 2021Updated 4 years ago
- Using pre-trained YOLO algorithm to detect faces in photo ID documents for ID verification☆10Apr 3, 2018Updated 7 years ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆17Nov 19, 2025Updated 3 months ago
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆27Feb 13, 2026Updated 2 weeks ago
- The Ecoacoustic Dataset from Arctic North Slope Alaska☆11May 29, 2025Updated 9 months ago
- Ace-Step Dataset Generator☆23Sep 27, 2025Updated 5 months ago
- Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"☆19Nov 3, 2025Updated 4 months ago
- Instagram Automation Tool is a framework that automates various Instagram tasks, including file-based operations and web automation (via …☆15May 4, 2025Updated 10 months ago
- ☆42Nov 22, 2024Updated last year
- The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"☆50Apr 7, 2025Updated 10 months ago
- ISS Tracker for the Cardputer Adv☆36Jan 19, 2026Updated last month
- Contains scripts for downloading activities from your Garmin Connect account.☆13Apr 2, 2024Updated last year
- The A2C Reinforcement Learning Algorithm in Pytorch☆16May 13, 2024Updated last year
- ☆12Jun 2, 2025Updated 9 months ago
- This is an android application which suggests you trending hashtags based on photo you upload.☆11Mar 24, 2021Updated 4 years ago
- Official Implementation of DMT: Dual Mean-Teacher in PyTorch.☆10Oct 27, 2023Updated 2 years ago
- A basic Google Docs document viewer.☆11Aug 22, 2019Updated 6 years ago
- Human-centric environment representations from egocentric video☆14Feb 5, 2026Updated last month
- A Gesture Recognition App using Microsoft Kinect V2☆10May 31, 2023Updated 2 years ago
- [CVPR'26 Findings] Source code for "RADSeg Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglom…☆31Feb 24, 2026Updated last week
- In this repository, we deal with the task of video frame interpolation with estimated optical flow. To estimate the optical flow we use p…☆10Apr 5, 2021Updated 4 years ago
- Utilities for SignWriting☆12Updated this week
- Code for ASE'24 paper "B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests"☆11Sep 10, 2024Updated last year
- Official implementation of DGP-based multi-speaker speech synthesis with PyTorch☆24Mar 23, 2021Updated 4 years ago
- This repository contains the official implementation of the paper "LandSegmenter: Towards a Flexible Foundation Model for Land Use and La…☆26Dec 8, 2025Updated 2 months ago