Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scratch on YouTube (YT-1B dataset).
☆54Apr 21, 2023Updated 2 years ago
Alternatives and similar repositories for video-pretrained-transformer
Users that are interested in video-pretrained-transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆27Apr 13, 2026Updated last week
- [NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale☆209Nov 13, 2023Updated 2 years ago
- A guide to structured generation using constrained decoding☆14Jun 9, 2024Updated last year
- ☆58Dec 2, 2025Updated 4 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆50Jun 16, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆154Sep 10, 2024Updated last year
- Official code for the LoG2022 paper -- MSGNN: A Spectral Graph Neural Network Based on a Novel Magnetic Signed Laplacian.☆14Feb 8, 2025Updated last year
- Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos☆28Dec 8, 2023Updated 2 years ago
- ☆18Aug 19, 2024Updated last year
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆66Jun 19, 2024Updated last year
- Implementation of "PaLM-E: An Embodied Multimodal Language Model"☆334Jan 29, 2024Updated 2 years ago
- Offline-first, decentralized graph database of collaborative Web apps☆15May 12, 2017Updated 8 years ago
- Audio-visual diarization pipeline used for creating VoxConverse dataset☆21Jun 6, 2025Updated 10 months ago
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆20Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This is where I write RL related stuff from scratch☆10Dec 15, 2019Updated 6 years ago
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆74Nov 4, 2025Updated 5 months ago
- Simulates agent path planning using A* and Q-Learning in a 2D grid☆12Apr 5, 2014Updated 12 years ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆44Mar 11, 2025Updated last year
- Official repo of the paper “AL-GTD: Deep Active Learning for Gaze Target Detection” (ACMMM2024)☆12Nov 29, 2024Updated last year
- Source code for AAAI2019 paper "Cash-out User Detection based on Attributed Heterogeneous Information Network with a Hierarchical Attenti…☆15Nov 12, 2018Updated 7 years ago
- DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks☆21Mar 13, 2025Updated last year
- ☆13Dec 15, 2025Updated 4 months ago
- [WWW '24] UnifiedSSR: A Unified Framework of Sequential Search and Recommendation☆12Feb 16, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"☆19Jan 18, 2026Updated 3 months ago
- [3DV 2025] VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition☆19Mar 18, 2025Updated last year
- ☆15Apr 26, 2025Updated 11 months ago
- This is official code implementation of the <Revisiting Neural Networks for Continual Learning: An Architectural Perspective> in IJCAI 20…☆13Nov 25, 2024Updated last year
- [ICCV2023 Oral] Implicit Temporal Modeling with Learnable Alignment for Video Recognition☆41Nov 29, 2023Updated 2 years ago
- A conda-smithy repository for colmap.☆15Apr 6, 2026Updated last week
- Code for XPERT algorithm from Personalized Retrieval over Millions of Items☆13Sep 14, 2023Updated 2 years ago
- SimOn: A Simple Framework for Online Temporal Action Localization☆22Nov 12, 2022Updated 3 years ago
- Research Notes☆11Sep 13, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Semantic Segmentation for CityScapes dataset, Pyramid Scene Parsing Network☆11Nov 7, 2020Updated 5 years ago
- Harvard AM205: group activity files☆18Nov 1, 2021Updated 4 years ago
- [CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"☆35Feb 2, 2024Updated 2 years ago
- Robotic Arm learns to approach objects using Deep Reinforcement Learning☆12Jun 21, 2023Updated 2 years ago
- Code Repository for Paper "HRGCN: Heterogeneous Graph-level Anomaly Detection with Hierarchical Relation-augmented Graph Neural Networks"☆16Sep 24, 2023Updated 2 years ago
- ☆21May 11, 2025Updated 11 months ago
- Uncovering Selective State Space Model's Capabilities in Lifelong Sequential Recommendation☆34May 8, 2024Updated last year