TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of untrimmed videos.
☆25Jun 4, 2025Updated 9 months ago
Alternatives and similar repositories for TEMPURA
Users that are interested in TEMPURA are comparing it to the libraries listed below
Sorting:
- ☆16Dec 4, 2024Updated last year
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆42Feb 5, 2025Updated last year
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆40Mar 16, 2025Updated 11 months ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- MR. Video: MapReduce is the Principle for Long Video Understanding☆30Apr 23, 2025Updated 10 months ago
- 【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge☆15Jul 18, 2023Updated 2 years ago
- Official Implementation of Video-MA2MBA☆12Dec 3, 2024Updated last year
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding. (WACV2025)☆34Apr 17, 2025Updated 10 months ago
- CVPR 24 paper: Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs☆14Mar 19, 2024Updated last year
- Data release for Step Differences in Instructional Video (CVPR24)☆14Jun 19, 2024Updated last year
- ☆37Nov 8, 2024Updated last year
- Official code repository of Shuffle-R1☆25Feb 23, 2026Updated last week
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆74Jan 20, 2025Updated last year
- ☆42Jul 9, 2025Updated 7 months ago
- [ICCV'2023] Compositional Feature Augmentation for Unbiased Scene Graph Generation☆15Dec 5, 2023Updated 2 years ago
- [CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling☆195Feb 23, 2026Updated last week
- [AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P …☆64Jan 27, 2026Updated last month
- This repo holds the implementation of PAVE: Patching and Adapting Video Large Language Models (CVPR2025)☆26Sep 6, 2025Updated 5 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆21Feb 27, 2025Updated last year
- (NeurIPS 2024) BiDM: Pushing the Limit of Quantization for Diffusion Models☆22Nov 20, 2024Updated last year
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Jun 12, 2025Updated 8 months ago
- ☆80Nov 24, 2024Updated last year
- Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query (ICCV2021)☆20Dec 4, 2021Updated 4 years ago
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Feb 22, 2026Updated last week
- This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos☆19Mar 3, 2025Updated last year
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆55Oct 9, 2025Updated 4 months ago
- ☆30Jan 18, 2026Updated last month
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Aug 5, 2024Updated last year
- Concat-ID: Towards Universal Identity-Preserving Video Synthesis☆66May 7, 2025Updated 9 months ago
- ☆34Oct 9, 2025Updated 4 months ago
- ☆21Oct 31, 2024Updated last year
- ☆26Mar 26, 2025Updated 11 months ago
- ☆28Apr 8, 2025Updated 10 months ago
- Official implementation of "A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives", accepted at CVPR 2…☆24Jun 13, 2024Updated last year
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆106Nov 28, 2024Updated last year
- Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"☆29Apr 16, 2024Updated last year
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Jul 22, 2025Updated 7 months ago