RenShuhuai-Andy / TESTAView external linksLinks
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
☆49Jan 9, 2024Updated 2 years ago
Alternatives and similar repositories for TESTA
Users that are interested in TESTA are comparing it to the libraries listed below
Sorting:
- [TMLR'24] This repository includes the official implementation our paper "FedConv: Enhancing Convolutional Neural Networks for Handling D…☆25Apr 30, 2024Updated last year
- ☆18Jul 10, 2024Updated last year
- Vision Large Language Models trained on M3IT instruction tuning dataset☆17Aug 16, 2023Updated 2 years ago
- This repository holds the "Fully automated landmarking and facial segmentation on 3D photographs" files☆30Oct 23, 2023Updated 2 years ago
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆16Jan 31, 2024Updated 2 years ago
- ☆47Jan 18, 2024Updated 2 years ago
- Exploration of the multi modal fuyu-8b model of Adept. 🤓 🔍☆27Nov 7, 2023Updated 2 years ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Mar 27, 2024Updated last year
- Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"☆33Nov 29, 2023Updated 2 years ago
- 2D road segmentation using lidar data during training☆43Dec 21, 2023Updated 2 years ago
- PyTorch Implementation of "ASTRA: An Action Spotting TRAnsformer for Soccer Videos", ACM MMSports 2023. | 3rd place solution for SoccerNe…☆41May 20, 2024Updated last year
- Create and share easy-to-make, built-to-last, innovative, and customizable experiences☆34Feb 21, 2024Updated last year
- ☆25Sep 19, 2023Updated 2 years ago
- multimodal change detection☆46Sep 20, 2024Updated last year
- Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER" @COLING-2022☆11Aug 20, 2022Updated 3 years ago
- A simple python package to stretch audio files and change their speed☆12Jan 16, 2026Updated last month
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding☆409May 8, 2025Updated 9 months ago
- The official implementation for Collaborative Word-based Pre-trained Item Representation for Transferable Recommendation.☆25Jan 30, 2024Updated 2 years ago
- Browser automation for creating new pages in WordPress☆13Jun 7, 2025Updated 8 months ago
- ☆12Dec 15, 2023Updated 2 years ago
- MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments☆13Jul 8, 2024Updated last year
- Wave Partial Differential Equation Solver in Python☆14Jun 5, 2024Updated last year
- Code of the paper "Efficient Object Detection in Autonomous Driving using Spiking Neural Networks: Performance, Energy Consumption Analys…☆27Dec 13, 2023Updated 2 years ago
- ☆31Nov 17, 2024Updated last year
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆206Jan 8, 2025Updated last year
- Official repo for EMNLP 2023 paper "Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations…☆29Dec 5, 2023Updated 2 years ago
- Low-latency Space-time Supersampling for Real-time Rendering☆33Feb 1, 2024Updated 2 years ago
- Detecting Deepfakes Without Seeing Any☆164Jul 25, 2024Updated last year
- [AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆60May 2, 2025Updated 9 months ago
- ☆32Mar 1, 2024Updated last year
- Self-hosted GPT-4V api☆27Nov 6, 2023Updated 2 years ago
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Mar 20, 2024Updated last year
- [NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin L…☆57Mar 4, 2024Updated last year
- [Pattern Recognition 2024] Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large Vision-Language Models, Dong Li, Jiandon…☆18Jan 18, 2025Updated last year
- The official repo for "GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction"☆29Mar 29, 2024Updated last year
- XmodelLM☆38Nov 19, 2024Updated last year
- This repository contains a framework for converting monocular videos into side-by-side (SBS) 3D videos. It utilizes a combination of imag…☆90Feb 11, 2024Updated 2 years ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆68Jun 9, 2024Updated last year
- DALI Multi Agent System Framework☆42Jan 30, 2026Updated 2 weeks ago