This is the offical repository of LLAVIDAL
☆23Oct 4, 2025Updated 4 months ago
Alternatives and similar repositories for LLAVIDAL
Users that are interested in LLAVIDAL are comparing it to the libraries listed below
Sorting:
- Official Repository of "Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads"☆17Oct 6, 2025Updated 4 months ago
- Code for the paper Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers☆21Aug 2, 2024Updated last year
- [CVPR 2024] Code and models for pi-ViT, a video transformer for understanding activities of daily living☆30Nov 12, 2025Updated 3 months ago
- Video + CLIP Baseline for Ego4D Long Term Action Anticipation Challenge (CVPR 2022)☆15Jul 4, 2022Updated 3 years ago
- [AAAI 2025] Official Repository of 'SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living'☆23Sep 17, 2025Updated 5 months ago
- [Main Conference @ EACL'26] [Workshop @ NeurIPS'24] 🎞️ LVNet.☆42Feb 10, 2026Updated 2 weeks ago
- Code for NeurIPS 2022 paper "Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space"☆20Apr 20, 2023Updated 2 years ago
- [CVPR 2026] Official Repository of 'MS-Temba: Multi-Scale Temporal Mamba for Understanding Long Untrimmed Videos'☆36Jan 23, 2026Updated last month
- Data release for Step Differences in Instructional Video (CVPR24)☆14Jun 19, 2024Updated last year
- ☆18Dec 17, 2022Updated 3 years ago
- A simple and flexible PyTorch implementation of Video StableDiffusion (ZeroScope_v2) based on diffusers.☆19Feb 15, 2024Updated 2 years ago
- [WIP] Code for LangToMo☆20Jun 25, 2025Updated 8 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆145Aug 22, 2025Updated 6 months ago
- WACV 2024: "PathLDM: Text conditioned Latent Diffusion Model for Histopathology"☆48Jul 7, 2024Updated last year
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆32May 27, 2025Updated 9 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Jul 22, 2025Updated 7 months ago
- ☆27Jul 18, 2025Updated 7 months ago
- Actor-agnostic Multi-label Action Recognition with Multi-modal Query [ICCVW '23]☆24Oct 20, 2023Updated 2 years ago
- Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors☆31Jun 2, 2024Updated last year
- ☆30Aug 14, 2023Updated 2 years ago
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"☆34Jun 17, 2024Updated last year
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'☆33Nov 7, 2023Updated 2 years ago
- [ICRA'24] Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning☆70Aug 4, 2024Updated last year
- ☆80Nov 24, 2024Updated last year
- Computing calibrated prediction intervals for neural network regressors☆10May 28, 2019Updated 6 years ago
- A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios☆13Jan 24, 2024Updated 2 years ago
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆40Mar 16, 2025Updated 11 months ago
- This repository provides the codes for MMA-DFER: multimodal (audiovisual) emotion recognition method. This is an official implementation …☆50Sep 16, 2024Updated last year
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆83Dec 24, 2025Updated 2 months ago
- Perceptual Grouping in Contrastive Vision-Language Models (ICCV'23)☆37Jan 1, 2024Updated 2 years ago
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆151Sep 10, 2024Updated last year
- This is a python library. Install with "python3 -m pip install rp" then run with "python3 -m rp" or just "rp". Requires python≥3.5☆13Feb 16, 2026Updated 2 weeks ago
- ☆11Dec 6, 2024Updated last year
- Prediction Intervals: Split Normal Mixture from Quality-Driven Deep Ensembles. Published at Uncertainty in AI (UAI) 2020.☆11Aug 31, 2020Updated 5 years ago
- ☆24Feb 7, 2026Updated 3 weeks ago
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".☆54May 25, 2025Updated 9 months ago
- ☆14Aug 29, 2024Updated last year
- Official Implementation for ACM MM2024 paper "VrdONE: One-stage Video Visual Relation Detection".☆11Nov 13, 2024Updated last year
- EgoToM is an egocentric theory-of-mind benchmark built on Ego4D videos, containing multi-choice questions that evaluate multimodal large …☆13Apr 1, 2025Updated 11 months ago