Neleac / SpaceTimeGPTLinks
video description generation vision-language model
☆19Updated 8 months ago
Alternatives and similar repositories for SpaceTimeGPT
Users that are interested in SpaceTimeGPT are comparing it to the libraries listed below
Sorting:
- ☆185Updated 11 months ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆82Updated 2 months ago
- ☆79Updated last year
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆131Updated 11 months ago
- Language Repository for Long Video Understanding☆32Updated last year
- PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models☆257Updated 2 months ago
- Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scra…☆52Updated 2 years ago
- Supercharged BLIP-2 that can handle videos☆121Updated last year
- [NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale☆198Updated last year