FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, and beyond.
☆104Dec 25, 2025Updated 2 months ago
Alternatives and similar repositories for FunQA
Users that are interested in FunQA are comparing it to the libraries listed below
Sorting:
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Jul 25, 2023Updated 2 years ago
- Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.☆462Jul 4, 2023Updated 2 years ago
- On-Device Domain Generalization☆46Nov 9, 2022Updated 3 years ago
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11May 24, 2023Updated 2 years ago
- [IJCV 2025] Code for DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection☆60Dec 24, 2024Updated last year
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆105Nov 9, 2023Updated 2 years ago
- [IJCV 2026] Official Code for "PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds"☆70Feb 12, 2026Updated 2 weeks ago
- 🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing imp…☆3,338Mar 5, 2024Updated last year
- General video interaction platform based on LLMs, including Video ChatGPT☆256Jul 26, 2023Updated 2 years ago
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆20May 22, 2025Updated 9 months ago
- Toolbox for HuMMan Dataset☆126Dec 7, 2024Updated last year
- A framework that allows you to apply Sparse AutoEncoder on any models☆51Jul 11, 2025Updated 7 months ago
- [ECCV 2022] StyleLight: HDR Panorama Generation for Lighting Estimation and Editing☆147Oct 9, 2023Updated 2 years ago
- ☆32Feb 8, 2024Updated 2 years ago
- The official repository of "Video assistant towards large language model makes everything easy"☆232Dec 24, 2024Updated last year
- [CVPR2023] All in One: Exploring Unified Video-Language Pre-training☆281Mar 25, 2023Updated 2 years ago
- ☆155Oct 31, 2024Updated last year
- [CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception☆124Jun 23, 2022Updated 3 years ago
- [TPAMI] Searching prompt modules for parameter-efficient transfer learning.☆238Dec 8, 2023Updated 2 years ago
- A local AI assistant running on your device. It turns your files into actionable memory.☆54Feb 15, 2026Updated 2 weeks ago
- ☆16Jul 23, 2024Updated last year
- Long Context Transfer from Language to Vision☆402Mar 18, 2025Updated 11 months ago
- [NeurIPS 2021] Garment4D: Garment Reconstruction from Point Cloud Sequences☆137Dec 11, 2021Updated 4 years ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Sep 21, 2023Updated 2 years ago
- Compress conventional Vision-Language Pre-training data☆53Sep 22, 2023Updated 2 years ago
- Toolbox for GTA-Human Datasets☆25Oct 9, 2024Updated last year
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆74Oct 14, 2024Updated last year
- Code for paper "Half-Physics: Enabling Kinematic 3D Human Model with Physical Interactions". Coming soon.☆33Jul 31, 2025Updated 7 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆360Jan 14, 2025Updated last year
- ☆37Oct 7, 2023Updated 2 years ago
- [ECCV 2022 & IJCV 2025] PyTorch code for SeqDeepFake: Detecting and Recovering Sequential DeepFake Manipulation☆150Dec 3, 2024Updated last year
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆183Sep 26, 2025Updated 5 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Jun 27, 2023Updated 2 years ago
- Aligning LMMs with Factually Augmented RLHF☆392Nov 1, 2023Updated 2 years ago
- [IEEE TPAMI-2024] Pair then Relation: Pair-Net for Panoptic Scene Graph Generation☆99Nov 20, 2024Updated last year
- Official code release for DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields (ICCV 2023)☆56May 24, 2024Updated last year
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆296Mar 13, 2024Updated last year
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆23Nov 25, 2025Updated 3 months ago