wguo28 / SSV2ALinks
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation.
☆64Updated 4 months ago
Alternatives and similar repositories for SSV2A
Users that are interested in SSV2A are comparing it to the libraries listed below
Sorting:
- The enhanced model is specially trained for aquatic targets, achieving higher accuracy. It can detect sailboats, humans, other vessels, b…☆46Updated 2 months ago
- Dynamic human image animation with strong identity preservation, heterogeneous character driving, and controllable backgrounds.☆138Updated 2 months ago
- AI-powered tool for analyzing GitHub trending repositories and URL metadata☆25Updated 3 weeks ago
- Efficient controlnet for DiTs☆381Updated 2 months ago
- MTLA: Multi-head Temporal Latent Attention☆664Updated last month
- ☆161Updated 9 months ago
- ☆154Updated last year
- We introduce the Audio Logical Reasoning (ALR) dataset, consisting of 6,446 text-audio annotated samples specifically designed for comple…☆923Updated 3 weeks ago
- We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFM…☆310Updated 6 months ago
- a iOS network debug library ,It can monitor HTTP requests within the App and displays information related to the request.☆15Updated 8 years ago
- Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs☆157Updated 4 months ago
- Liang - Non functional requirements should be part of function interfaces☆1,015Updated 3 years ago
- ☆603Updated last year
- Data and code supporting data examples analysis in the paper "Assessing the interconnectedness and systemic risk contagion in the Chinese…☆20Updated 11 months ago
- cheper hcaptcha、recaptcha、recaptchav3、turnstile、5s solver bypass☆506Updated this week
- 日历软件重写☆453Updated 4 months ago
- Framework that enables fine-tuning of vision-language grounding models on custom datasets☆602Updated 3 months ago
- ☆10Updated last month
- ☆516Updated 5 months ago
- [ICCV2025 Highlight] DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration☆429Updated last week
- Spring Boot framework for implementing distributed transactions using reliable messaging with RabbitMQ☆413Updated 4 months ago
- A user-friendly ROS 2 bag filter with a graphical user interface (GUI) ✨☆27Updated 2 months ago
- OmniAgent Framework is an advanced, modular AI orchestration system that transforms Web3 development by seamlessly integrating artificial…☆320Updated 6 months ago
- [NeurIPS2024] MVGamba: Unify 3D Content Generation as State Space Sequence Modeling☆61Updated 7 months ago
- A Trusted Human-Multi-Agent Reinforcement Learning Interaction Framework☆503Updated this week
- Leveraging AI, this solution boosts 360° video quality through 4x upscaling with Real-ESRGAN. It integrates GFPGAN for smart face enhance…☆20Updated last month
- DeepWism R2 is a next-generation AGI system built on the T3CEDS framework (Thin-Thick-Thin Crowd Entropy Dynamics System), which redefine…☆1,025Updated last month
- An R package for Bayesian estimation of probit unfolding models for binary preference data. This R package is described in the paper "pum…☆13Updated 2 months ago
- ☆150Updated 10 months ago
- [MM 2024] Official code for VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness☆49Updated last year