Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scratch on YouTube (YT-1B dataset).
☆54Apr 21, 2023Updated 3 years ago
Alternatives and similar repositories for video-pretrained-transformer
Users that are interested in video-pretrained-transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale☆209Nov 13, 2023Updated 2 years ago
- pytorch implementation of Semantics-AssistedVideoCaptioning☆11Feb 16, 2023Updated 3 years ago
- Code for DVD A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue☆14Oct 12, 2021Updated 4 years ago
- A guide to structured generation using constrained decoding☆18Jun 9, 2024Updated last year
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆16Nov 11, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"☆25Mar 9, 2024Updated 2 years ago
- ☆58Dec 2, 2025Updated 5 months ago
- ☆55Apr 24, 2024Updated 2 years ago
- Extension of hLSTMat☆19Apr 15, 2021Updated 5 years ago
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆154Sep 10, 2024Updated last year
- Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos☆28Dec 8, 2023Updated 2 years ago
- ☆18Aug 19, 2024Updated last year
- A much powerful probing method to tune your model with promising performance and linear probing training cost!☆15Jul 26, 2023Updated 2 years ago
- Offline-first, decentralized graph database of collaborative Web apps☆15May 12, 2017Updated 8 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆21Updated this week
- Ultra Fast Multi-Modality Vector Database☆18Feb 21, 2024Updated 2 years ago
- ☆13Jul 20, 2022Updated 3 years ago
- Multimodal and multilingual topic model with pretrained embeddings☆12Apr 11, 2023Updated 3 years ago
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆73Nov 4, 2025Updated 6 months ago
- Simulates agent path planning using A* and Q-Learning in a 2D grid☆12Apr 5, 2014Updated 12 years ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆44Mar 11, 2025Updated last year
- Self-supervised place recognition by exploring temporal and feature neighborhoods☆16Dec 9, 2024Updated last year
- Contrastive Learning Reduces Hallucination in Conversations☆25Oct 17, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Generate interleaved text and image content in a structured format you can directly pass to downstream APIs.☆29Oct 18, 2024Updated last year
- Source code for AAAI2019 paper "Cash-out User Detection based on Attributed Heterogeneous Information Network with a Hierarchical Attenti…☆15Nov 12, 2018Updated 7 years ago
- A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series…☆29Nov 11, 2024Updated last year
- ☆13Dec 15, 2025Updated 4 months ago
- underwater dataset, open-data☆12Aug 22, 2021Updated 4 years ago
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆46Aug 14, 2023Updated 2 years ago
- Repository for Nature Communications paper entitled "Sleep-like Unsupervised Replay Reduces Catastrophic Forgetting in Artificial Neural …☆15Oct 28, 2022Updated 3 years ago
- ☆18Aug 1, 2025Updated 9 months ago
- Code for XPERT algorithm from Personalized Retrieval over Millions of Items☆13Sep 14, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- The Shape of Data: Intrinsic Distance for Comparing Data Distributions☆12Sep 25, 2019Updated 6 years ago
- ☆15Oct 29, 2019Updated 6 years ago
- Research Notes☆11Sep 13, 2020Updated 5 years ago
- ☆87Mar 4, 2024Updated 2 years ago
- [CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"☆35Feb 2, 2024Updated 2 years ago
- A swarm of LLM agents that will help you test, document, and productionize your code!☆18Apr 27, 2026Updated last week
- ☆33Aug 19, 2023Updated 2 years ago