Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scratch on YouTube (YT-1B dataset).
☆54Apr 21, 2023Updated 3 years ago
Alternatives and similar repositories for video-pretrained-transformer
Users that are interested in video-pretrained-transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆27Jun 22, 2026Updated last week
- [NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale☆209Nov 13, 2023Updated 2 years ago
- pytorch implementation of Semantics-AssistedVideoCaptioning☆11Feb 16, 2023Updated 3 years ago
- A guide to structured generation using constrained decoding☆18Jun 9, 2024Updated 2 years ago
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆15Nov 11, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"☆25Mar 9, 2024Updated 2 years ago
- Extension of hLSTMat☆19Apr 15, 2021Updated 5 years ago
- ☆57Apr 24, 2024Updated 2 years ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆50Jun 16, 2023Updated 3 years ago
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆154Sep 10, 2024Updated last year
- AI wearable necklace☆13Jul 29, 2024Updated last year
- EDUVSUM is a multimodal neural architecture that utilizes state-of-the-art audio, visual and textual features to identify important tempo…☆23Mar 8, 2024Updated 2 years ago
- Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos☆28Dec 8, 2023Updated 2 years ago
- ☆18Aug 19, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The open sourced code from the Decent AI mobile app, built with Expo☆14Apr 3, 2025Updated last year
- Implementation of "PaLM-E: An Embodied Multimodal Language Model"☆337Jan 29, 2024Updated 2 years ago
- Audio-visual diarization pipeline used for creating VoxConverse dataset☆22Jun 6, 2025Updated last year
- ☆13Sep 20, 2023Updated 2 years ago
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆22Updated this week
- Ultra Fast Multi-Modality Vector Database☆18Feb 21, 2024Updated 2 years ago
- ☆13Jul 20, 2022Updated 3 years ago
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆73Nov 4, 2025Updated 7 months ago
- Pytorch implementation of a neural network capable of recognizing finger patters on the frets of a guitar.☆10Sep 26, 2021Updated 4 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Task-Focused Few-Shot Object Detection Benchmark☆14Jun 24, 2025Updated last year
- Generate interleaved text and image content in a structured format you can directly pass to downstream APIs.☆29Oct 18, 2024Updated last year
- ☆19Dec 7, 2024Updated last year
- A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series…☆29Nov 11, 2024Updated last year
- ☆15Jun 8, 2026Updated 3 weeks ago
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆46Aug 14, 2023Updated 2 years ago
- Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"☆19Jan 18, 2026Updated 5 months ago
- Repository for Nature Communications paper entitled "Sleep-like Unsupervised Replay Reduces Catastrophic Forgetting in Artificial Neural …☆15Oct 28, 2022Updated 3 years ago
- A web interface for the Bee AI that uses your api key☆20Jul 25, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆15Apr 26, 2025Updated last year
- ☆18May 24, 2026Updated last month
- [ICCV2023 Oral] Implicit Temporal Modeling with Learnable Alignment for Video Recognition☆41Nov 29, 2023Updated 2 years ago
- A conda-smithy repository for colmap.☆15Jun 15, 2026Updated 2 weeks ago
- The Continual Learning App☆14Nov 3, 2021Updated 4 years ago
- MCP SERVER☆43Mar 28, 2026Updated 3 months ago
- ☆17Jun 15, 2022Updated 4 years ago