lucas-ventura / chapter-llama
Official PyTorch implementation of the paper "Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs"
☆59Updated last week
Alternatives and similar repositories for chapter-llama:
Users that are interested in chapter-llama are comparing it to the libraries listed below
- Code for "Don’t drop your samples! Coherence-aware training benefits Conditional diffusion" CVPR 2024 Highlight☆52Updated 2 months ago
- Reliability in Semantic Segmentation: Can We Use Synthetic Data? (ECCV 2024)☆35Updated 9 months ago
- Code for "How far can we go with ImageNet for Text-to-Image generation?" paper☆86Updated last month
- Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".☆106Updated 3 weeks ago
- Code for "Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation"☆67Updated 3 weeks ago
- ☆69Updated 6 months ago
- official implementation of the Polynomial Mixer☆18Updated 2 months ago
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)☆27Updated 8 months ago
- Code for the paper "AMEGO: Active Memory from long EGOcentric videos" published at ECCV 2024☆38Updated 4 months ago
- Official implementation of "URECA : Unique Region Caption Anything"☆42Updated 3 weeks ago
- (CVPRW 2022) Learning Co-segmentation by Segment Swapping for Retrieval and Discovery☆54Updated 2 years ago
- ☆41Updated last week
- [WACV 2024] Learning the What and How of Annotation in Video Object Segmentation☆26Updated 10 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆37Updated 10 months ago
- FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024☆20Updated 4 months ago
- ☆11Updated last year
- [ArXiv 2025] Official Implementation of "Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation". Junyu Xie, Tengda …☆11Updated last month
- This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long …☆88Updated 11 months ago
- ☆89Updated 3 months ago
- Official implementation of "A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives", accepted at CVPR 2…☆18Updated 10 months ago
- ☆14Updated 2 months ago
- Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"☆26Updated last year
- [CVPR 2024 Highlight] SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers☆63Updated 10 months ago
- Diffusion Models as Data Mining Tools☆54Updated 2 months ago
- Toolbox for the Earth Parser Dataset, a dataset presented in the "Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans" pape…☆25Updated last year
- [ECCV'24] Official PyTorch implementation of In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation☆40Updated 7 months ago
- Code and data release for the paper "Learning Object State Changes in Videos: An Open-World Perspective" (CVPR 2024)☆32Updated 7 months ago
- ☆15Updated last year
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆43Updated 3 months ago
- ☆61Updated last year