[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
☆49Jan 9, 2024Updated 2 years ago
Alternatives and similar repositories for TESTA
Users that are interested in TESTA are comparing it to the libraries listed below
Sorting:
- [TMLR'24] This repository includes the official implementation our paper "FedConv: Enhancing Convolutional Neural Networks for Handling D…☆25Apr 30, 2024Updated last year
- hello☆57Oct 21, 2024Updated last year
- ☆18Jul 10, 2024Updated last year
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆16Jan 31, 2024Updated 2 years ago
- [ECAI 2023] MonoSKD: General Distillation Framework for Monocular 3D Object Detection via Spearman Correlation Coefficient☆32Dec 8, 2023Updated 2 years ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Mar 27, 2024Updated last year
- 2D road segmentation using lidar data during training☆43Dec 21, 2023Updated 2 years ago
- PyTorch Implementation of "ASTRA: An Action Spotting TRAnsformer for Soccer Videos", ACM MMSports 2023. | 3rd place solution for SoccerNe…☆42May 20, 2024Updated last year
- Create and share easy-to-make, built-to-last, innovative, and customizable experiences☆34Feb 21, 2024Updated 2 years ago
- This repository is the project page for "Point Anywhere: Directed Object Estimation from Omnidirectional Images", including source code …☆12Aug 25, 2023Updated 2 years ago
- ☆25Sep 19, 2023Updated 2 years ago
- This repository contains the resource introduced in the paper: "Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis"…☆25Oct 15, 2025Updated 4 months ago
- A simple python package to stretch audio files and change their speed☆12Feb 18, 2026Updated 2 weeks ago
- Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER" @COLING-2022☆11Aug 20, 2022Updated 3 years ago
- multimodal change detection☆47Sep 20, 2024Updated last year
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding☆410May 8, 2025Updated 10 months ago
- MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments☆13Jul 8, 2024Updated last year
- ☆12Dec 15, 2023Updated 2 years ago
- Code of the paper "Efficient Object Detection in Autonomous Driving using Spiking Neural Networks: Performance, Energy Consumption Analys…☆27Dec 13, 2023Updated 2 years ago
- ☆83Oct 31, 2024Updated last year
- ☆31Nov 17, 2024Updated last year
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆207Jan 8, 2025Updated last year
- Low-latency Space-time Supersampling for Real-time Rendering☆33Feb 1, 2024Updated 2 years ago
- Official repo for EMNLP 2023 paper "Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations…☆29Dec 5, 2023Updated 2 years ago
- [CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning☆126Dec 28, 2024Updated last year
- Detecting Deepfakes Without Seeing Any☆164Jul 25, 2024Updated last year
- [AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆61May 2, 2025Updated 10 months ago
- Video Feature Enhancement with PyTorch☆32Nov 28, 2024Updated last year
- Self-hosted GPT-4V api☆27Nov 6, 2023Updated 2 years ago
- ☆32Mar 1, 2024Updated 2 years ago
- [Pattern Recognition 2024] Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large Vision-Language Models, Dong Li, Jiandon…☆18Jan 18, 2025Updated last year
- [NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin L…☆57Mar 4, 2024Updated 2 years ago
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Mar 20, 2024Updated last year
- Official code for "3HAN: A Deep Neural Network for Fake News Detection" (ICONIP 2017)☆88Jun 21, 2018Updated 7 years ago
- The official repo for "GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction"☆29Mar 29, 2024Updated last year
- This repository contains a framework for converting monocular videos into side-by-side (SBS) 3D videos. It utilizes a combination of imag…☆90Feb 11, 2024Updated 2 years ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆69Jun 9, 2024Updated last year
- DALI Multi Agent System Framework☆42Jan 30, 2026Updated last month
- ☆37Jan 20, 2024Updated 2 years ago