ZYH-Lightyear/LVAS

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ZYH-Lightyear/LVAS)

ZYH-Lightyear / LVAS

LVAS-Agent Code Base

☆21

Alternatives and similar repositories for LVAS

Users that are interested in LVAS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

EnVision-Research / PhysToolBench
View on GitHub
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
☆30Updated this week
jnwnlee / selva
View on GitHub
[CVPR 2026] Official PyTorch implementation of SelVA "Hear What Matters! Text-conditioned Selective Video-to-Audio Generation"
☆15Mar 27, 2026Updated 3 months ago
hkchengrex / av-benchmark
View on GitHub
Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs…
☆79Feb 14, 2026Updated 5 months ago
EnVision-Research / FractFlow
View on GitHub
☆25Jul 28, 2025Updated 11 months ago
ypwang61 / StoryEval
View on GitHub
[CVPR2025] Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
☆21May 2, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
lxa9867 / QSD
View on GitHub
[CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"
☆12Feb 27, 2024Updated 2 years ago
LinglingCai0314 / FreeMask
View on GitHub
☆11Jan 18, 2025Updated last year
EnVision-Research / LucidFusion
View on GitHub
Official implementation of “LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images”
☆76Mar 21, 2025Updated last year
wz0919 / DreamRunner
View on GitHub
[AAAI 2026] Official implementation of DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
☆78Jun 11, 2025Updated last year
xiaoqian-shen / MoStGAN-V
View on GitHub
[CVPR 2023] Official PyTorch implementation of MoStGAN-V
☆25Jun 15, 2023Updated 3 years ago
maziao / T2I-Eval
View on GitHub
[ACL 2025 Main] Open-source toolkit for automatic evaluation of text-to-image generation task, including training & test datasets and a d…
☆20Jul 5, 2025Updated last year
EnVision-Research / TASC
View on GitHub
☆27Apr 28, 2025Updated last year
zhengxuJosh / SAM4SS
View on GitHub
SAM4SS: Tailoring SAM and SAM2 for Semantic Segmentation
☆11Jul 31, 2024Updated last year
schowdhury671 / meerkat
View on GitHub
☆35Jul 9, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
celoron / ComfyUI-VisualQueryTemplate
View on GitHub
A ComfyUI node for transforming images into descriptive text using templated visual question answering. Leverages Hugging Face's VQA mode…
☆14Apr 1, 2025Updated last year
ZijiaLewisLu / CVPR2025-DeCafNet
View on GitHub
Official Repo for CVPR 2025 Paper -- DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos
☆17Mar 16, 2026Updated 4 months ago
kaiw7 / STG-CMA
View on GitHub
Towards Efficient Audio-Visual Learners via Empowering Pre-trained Vision Transformers with Cross-Modal Adaptation
☆15Apr 13, 2024Updated 2 years ago
ku-vai / TPoS
View on GitHub
This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)
☆25Dec 7, 2023Updated 2 years ago
Yusiissy / SonicVisionLM
View on GitHub
☆75Jan 8, 2024Updated 2 years ago
SitongGong / Veason-R1
View on GitHub
Official code of Veason-R1
☆15Jul 14, 2026Updated last week
GeWu-Lab / Stepping-Stones
View on GitHub
The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024
☆18Oct 11, 2024Updated last year
HuMengXue0104 / MVAD
View on GitHub
MVAD is the first general-purpose dataset specifically designed for detecting AI-generated multimodal video-audio content.
☆21Apr 25, 2026Updated 2 months ago
traugdor / ComfyUI-quadMoons-nodes
View on GitHub
Repository for all the nodes I created on my own for ComfyUI.
☆16Dec 4, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
1llusion / deepdreamtravel
View on GitHub
Generate deep dream videos from a single image.
☆14Feb 16, 2023Updated 3 years ago
Evergreen0929 / Spherical-Projection-Shape-Generation
View on GitHub
The official implementation of 'SPGen: Spherical Projection as Consistent and Flexible Representation for Single Image 3D Shape Generatio…
☆17Dec 14, 2025Updated 7 months ago
j1anglin / ReCorD
View on GitHub
[ACM MM 2024] Reasoning and Correcting Diffusion for HOI Generation
☆14Oct 1, 2024Updated last year
kaist-ami / AVHBench
View on GitHub
[ICLR'25] Official repository for "AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models"
☆25Mar 8, 2026Updated 4 months ago
ariel415el / PerceptualLossExperiments
View on GitHub
Examine the impact of perceptual and its alternatives loss on GLO
☆15Nov 22, 2021Updated 4 years ago
nkchocoai / ComfyUI-TextOnSegs
View on GitHub
Custom node for ComfyUI. Add a node for drawing text to the area of SEGS.
☆14Mar 30, 2025Updated last year
zhengxuJosh / AnySeg
View on GitHub
Code & Weights for “Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation”
☆15Dec 6, 2024Updated last year
FuchenUSTC / VideoStudio
View on GitHub
☆33Jul 5, 2024Updated 2 years ago
EthanLiang99 / AuthFace
View on GitHub
AuthFace: Towards Authentic Blind Face Restoration with Face-oriented Generative Diffusion Prior (ACM MM 2025 Oral)
☆18Mar 5, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Rongjiehuang / awesome-speech-to-speech-translation
View on GitHub
List of direct speech-to-speech translation papers.
☆39Jan 31, 2023Updated 3 years ago
William-N-Havard / SpeechCoco
View on GitHub
☆12Nov 23, 2020Updated 5 years ago
ictnlp / GMA
View on GitHub
Code for ACL 2022 findings paper "Gaussian Multi-head Attention for Simultaneous Machine Translation"
☆11Mar 31, 2022Updated 4 years ago
bertvanbrakel / mcp-cadquery
View on GitHub
☆17Apr 7, 2025Updated last year
simarmehta / chessAutomation_CV
View on GitHub
This repository implements computer vision for real-time chessboard detection and piece recognition. Using OpenCV and Numpy, the system p…
☆15Sep 24, 2024Updated last year
QC-LY / UiG
View on GitHub
Code for "Understanding-in-Generation:Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation"
☆15Nov 11, 2025Updated 8 months ago
WikiChao / Ego-AV-Loc
View on GitHub
[CVPR 2023] Egocentric Audio-Visual Object Localization
☆27Jan 6, 2024Updated 2 years ago