ChanganVR/action2sound

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ChanganVR/action2sound)

ChanganVR / action2sound

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

☆25

Alternatives and similar repositories for action2sound

Users that are interested in action2sound are comparing it to the libraries listed below

Sorting:

Ego4DSounds / Ego4DSounds
View on GitHub
Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence
☆19Jun 14, 2024Updated last year
jnwnlee / video-foley
View on GitHub
Official implementation of "Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound". IEEE TASLP 20…
☆17Feb 27, 2026Updated last week
chentuochao / Target-Conversation-Extraction
View on GitHub
This is the code and dataset repo for Interspeech 2024 paper "Target conversation extraction: Source separation using turn-taking dynamic…
☆55Aug 15, 2025Updated 6 months ago
aispeech-lab / advr-avss
View on GitHub
Pytorch implementation of our paper: Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training.
☆18Jul 11, 2022Updated 3 years ago
BingYang-20 / DP-RTF-Learning
View on GitHub
A python implementation of “Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization” [TASLP 2021]
☆27Feb 11, 2023Updated 3 years ago
roger-tseng / av-superb
View on GitHub
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆58Apr 17, 2024Updated last year
penfever / wildchat-50m
View on GitHub
Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.
☆34Apr 1, 2025Updated 11 months ago
RBenita / DIFFAR
View on GitHub
Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
☆32Mar 8, 2024Updated last year
AnupBhat30 / ilya-sutskever-ai-reading-list
View on GitHub
☆29Feb 8, 2026Updated 3 weeks ago
lucataco / cog-OmniParser-v2
View on GitHub
Cog wrapper for microsoft/OmniParser-v2
☆12Feb 25, 2025Updated last year
adevinta / forecastout
View on GitHub
Python library to forecast univariate time series through backtesting model selection
☆23Jun 12, 2024Updated last year
RusselZHANG / Microphone-Array-Generalization-for-Multichannel-Narrowband-Deep-Speech-Enhancement
View on GitHub
This is the microphone array generalization investigation based on previous Narrow Band Deep Filtering methods.
☆38Mar 12, 2024Updated last year
girishvn / BigSmall
View on GitHub
BigSmall: Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological Measurements
☆41Feb 2, 2026Updated last month
rhoposit / multilingual_VQVAE
View on GitHub
☆37May 8, 2021Updated 4 years ago
anjalibshah / KYC-ID-Verification-YOLO
View on GitHub
Using pre-trained YOLO algorithm to detect faces in photo ID documents for ID verification
☆10Apr 3, 2018Updated 7 years ago
Yifei-Zuo / Flash-LLA
View on GitHub
Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…
☆23Oct 1, 2025Updated 5 months ago
FreedomIntelligence / MTalk-Bench
View on GitHub
MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols
☆17Nov 19, 2025Updated 3 months ago
NARUTO-2024 / WavBench
View on GitHub
WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models
☆27Feb 13, 2026Updated 2 weeks ago
speechLabBcCuny / EDANSA
View on GitHub
The Ecoacoustic Dataset from Arctic North Slope Alaska
☆11May 29, 2025Updated 9 months ago
methmx83 / Ace-Step_Data-Tool
View on GitHub
Ace-Step Dataset Generator
☆23Sep 27, 2025Updated 5 months ago
JaesungHuh / ca-subtitle
View on GitHub
Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"
☆19Nov 3, 2025Updated 4 months ago
AhmadovMahammad / instagram-data-manager
View on GitHub
Instagram Automation Tool is a framework that automates various Instagram tasks, including file-based operations and web automation (via …
☆15May 4, 2025Updated 10 months ago
zexupan / MuSE
View on GitHub
☆42Nov 22, 2024Updated last year
cuhealthybrains / MT-LLM
View on GitHub
The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"
☆50Apr 7, 2025Updated 10 months ago
adammelancon / cardputer-satellite-tracker
View on GitHub
ISS Tracker for the Cardputer Adv
☆36Jan 19, 2026Updated last month
mvaneijken / GarminConnect
View on GitHub
Contains scripts for downloading activities from your Garmin Connect account.
☆13Apr 2, 2024Updated last year
Lucasc-99 / Actor-Critic
View on GitHub
The A2C Reinforcement Learning Algorithm in Pytorch
☆16May 13, 2024Updated last year
knowledgetechnologyuhh / gasp
View on GitHub
☆12Jun 2, 2025Updated 9 months ago
SuvojitBarick / HasiGen
View on GitHub
This is an android application which suggests you trending hashtags based on photo you upload.
☆11Mar 24, 2021Updated 4 years ago
gyx-gloria / DMT
View on GitHub
Official Implementation of DMT: Dual Mean-Teacher in PyTorch.
☆10Oct 27, 2023Updated 2 years ago
ml-archive / google-docs-viewer-flutter
View on GitHub
A basic Google Docs document viewer.
☆11Aug 22, 2019Updated 6 years ago
facebookresearch / ego-env
View on GitHub
Human-centric environment representations from egocentric video
☆14Feb 5, 2026Updated last month
chethanMysore / gesture-recognition-app
View on GitHub
A Gesture Recognition App using Microsoft Kinect V2
☆10May 31, 2023Updated 2 years ago
RADSeg-OVSS / RADSeg
View on GitHub
[CVPR'26 Findings] Source code for "RADSeg Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglom…
☆31Feb 24, 2026Updated last week
vineeths96 / Video-Interpolation-using-Deep-Optical-Flow
View on GitHub
In this repository, we deal with the task of video frame interpolation with estimated optical flow. To estimate the optical flow we use p…
☆10Apr 5, 2021Updated 4 years ago
sign-language-processing / signwriting
View on GitHub
Utilities for SignWriting
☆12Updated this week
ZJU-CTAG / B4
View on GitHub
Code for ASE'24 paper "B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests"
☆11Sep 10, 2024Updated last year
sarulab-speech / multi-speaker-dgp
View on GitHub
Official implementation of DGP-based multi-speaker speech synthesis with PyTorch
☆24Mar 23, 2021Updated 4 years ago
zhu-xlab / LandSegmenter
View on GitHub
This repository contains the official implementation of the paper "LandSegmenter: Towards a Flexible Foundation Model for Land Use and La…
☆26Dec 8, 2025Updated 2 months ago