ylingfeng/AVT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ylingfeng/AVT)

ylingfeng / AVT

Introduce a novel Video Trimming (VT) task and proposes an agent-based approach (AVT) for detecting wasted footage, selecting valuable segments, and composing them into a coherent final video.

☆28

Alternatives and similar repositories for AVT

Users that are interested in AVT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

icq-benchmark / icq-benchmark
View on GitHub
☆19Jul 28, 2025Updated 11 months ago
jhayes14 / advsteg
View on GitHub
Steganography via adversarial training
☆15Dec 1, 2018Updated 7 years ago
IMCCretrieval / MomentDiff
View on GitHub
MomentDiff: Generative Video Moment Retrieval from Random to Real--NeurIPS 2023
☆80Nov 2, 2023Updated 2 years ago
shashankvkt / video_object_segmentation
View on GitHub
Implementation of "Youtube-VOS: Sequence-to-sequence video object segmentation"
☆14Oct 15, 2019Updated 6 years ago
GAP-LAB-CUHK-SZ / DisCo
View on GitHub
Implementation of DisCo:Diffusion-based Cross-modal Shape Reconstruction from LASA (CVPR2024)
☆19Nov 8, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
PRIS-CV / CineTechBench
View on GitHub
A Benchmark for Cinematographic Technique Understanding and Generation
☆29Sep 19, 2025Updated 10 months ago
zhouyiks / CoLVA
View on GitHub
☆44Jul 9, 2025Updated last year
Lzq5 / Video-Text-Alignment
View on GitHub
☆28Jul 18, 2025Updated last year
Z2HENG / DeTAL
View on GitHub
☆18May 7, 2025Updated last year
snowflakewang / PBR_Boost_3DGen
View on GitHub
[SIGGRAPH ASIA 2024 Conference Paper] Boosting 3D Object Generation through PBR Materials
☆16Dec 26, 2024Updated last year
zjukg / DUET
View on GitHub
[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
☆54Feb 9, 2024Updated 2 years ago
harrytea / ROOT
View on GitHub
ROOT: VLM based System for Indoor Scene Understanding and Beyond
☆42Jan 22, 2025Updated last year
cyh-sj / FaceEdit3D
View on GitHub
3D-Aware Face Editing via Warping-Guided Latent Direction Learning
☆24Jul 2, 2024Updated 2 years ago
simarmehta / chessAutomation_CV
View on GitHub
This repository implements computer vision for real-time chessboard detection and piece recognition. Using OpenCV and Numpy, the system p…
☆15Sep 24, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
NeverMoreLCH / Awesome-Video-Grounding
View on GitHub
A reading list of papers about Visual Grounding.
☆31Aug 24, 2022Updated 3 years ago
SivanDoveh / TSVLC
View on GitHub
Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models
☆47Sep 25, 2023Updated 2 years ago
zhenyuanlu / awesome-pain-intensity-classification-papers
View on GitHub
A comprehensive list of pain intensity classification papers mainly based on deep learning algorithms
☆12Oct 20, 2024Updated last year
Jiaxing-star / LLaVA-Octopus
View on GitHub
☆11Jan 8, 2025Updated last year
viktor4006094 / DegreeProject
View on GitHub
Polarization parameters and polarizing filters in real-time ray tracing with DXR and the Stokes-Mueller calculus
☆13Jun 18, 2020Updated 6 years ago
zhengrongz / AoTD
View on GitHub
[CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".
☆58May 25, 2025Updated last year
LAION-AI / model-retrieval
View on GitHub
Easily compute model embeddings and save the embeddings.
☆11Dec 10, 2022Updated 3 years ago
Phoenix1327 / ML-ZSL
View on GitHub
☆11Nov 12, 2018Updated 7 years ago
fletcherjiang / LLMEPET
View on GitHub
[MM'24 Oral] Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
☆130Aug 23, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
multimodal-art-projection / IV-Bench
View on GitHub
☆14Apr 23, 2025Updated last year
CVMI-Lab / Hybrid-Occ-SDF
View on GitHub
This is the officially implementation of ICCV 2023 paper " Learning A Room with the Occ-SDF Hybrid: Signed Distance Function Mingled with…
☆11Dec 7, 2023Updated 2 years ago
traveler-framework / TraveLER
View on GitHub
[EMNLP 2024] TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
☆18Oct 31, 2024Updated last year
TIGER-AI-Lab / PixelWorld
View on GitHub
The official code of "PixelWorld: Towards Perceiving Everything as Pixels" [TMLR25]
☆15Sep 12, 2025Updated 10 months ago
To-Data-Beyond / Multimodal-RAG
View on GitHub
Hands-On Tutorial on Building Multimodal RAG Systems
☆14Apr 10, 2025Updated last year
OliverRensu / GLSTR
View on GitHub
☆13Jan 5, 2022Updated 4 years ago
dmoltisanti / air-cvpr23
View on GitHub
This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…
☆13May 25, 2023Updated 3 years ago
Lookuz / VidHal
View on GitHub
Codebase for VidHal: Benchmarking Hallucinations in Vision LLMs
☆14Apr 23, 2026Updated 2 months ago
liyingxuan1012 / zeroshot-speaker-prediction
View on GitHub
Official repository of "Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion" (ACMMM 2024)
☆16Oct 31, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
jialuli-luka / Video-MSG
View on GitHub
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
☆28Apr 14, 2025Updated last year
zhongjinluo / SimpModeling
View on GitHub
We present SimpModeling, a novel sketching system designed for amateur users to create desired animalmorphic heads.
☆38Jul 4, 2023Updated 3 years ago
facebookresearch / stepdiff
View on GitHub
Data release for Step Differences in Instructional Video (CVPR24)
☆15Jun 19, 2024Updated 2 years ago
RUCAIBox / CIR
View on GitHub
☆16Nov 11, 2025Updated 8 months ago
PRIS-CV / Category-Specific-Prompt
View on GitHub
Code release for "Category-Specific Prompts for Animal Action Recognition with Pretrained Vision-Language Models"
☆14Feb 21, 2024Updated 2 years ago
ChenyunWu / PhraseCutDataset
View on GitHub
Dataset API for "PhraseCut: Language-based Image Segmentation in the Wild"
☆116Mar 28, 2026Updated 3 months ago
MasterHow / OccFiner
View on GitHub
Offboard Occupancy Refinement with Hybrid Propagation for Autonomous Driving
☆15Feb 10, 2025Updated last year