KastanDay / video-pretrained-transformerView external linksLinks
Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scratch on YouTube (YT-1B dataset).
☆54Apr 21, 2023Updated 2 years ago
Alternatives and similar repositories for video-pretrained-transformer
Users that are interested in video-pretrained-transformer are comparing it to the libraries listed below
Sorting:
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆27Jan 17, 2026Updated last month
- A guide to structured generation using constrained decoding☆14Jun 9, 2024Updated last year
- [ECCV2022] A PyTorch implementation of the paper "Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embo…☆13Mar 20, 2023Updated 2 years ago
- [NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale☆203Nov 13, 2023Updated 2 years ago
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆16Nov 11, 2024Updated last year
- ☆58Dec 2, 2025Updated 2 months ago
- Audio-visual diarization pipeline used for creating VoxConverse dataset☆21Jun 6, 2025Updated 8 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆50Jun 16, 2023Updated 2 years ago
- ☆54Apr 24, 2024Updated last year
- Generate interleaved text and image content in a structured format you can directly pass to downstream APIs.☆29Oct 18, 2024Updated last year
- Pytorch Implementation of Deepmind's SIMA: "Scaling Instructable Agents Across Many Simulated Worlds"☆29Jun 17, 2024Updated last year
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆21Feb 9, 2026Updated last week
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆150Sep 10, 2024Updated last year
- Video shot transition detection☆25Mar 9, 2023Updated 2 years ago
- A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series…☆28Nov 11, 2024Updated last year
- Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"☆26Feb 6, 2026Updated last week
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆64Jun 19, 2024Updated last year
- Supercharged BLIP-2 that can handle videos☆123Dec 1, 2023Updated 2 years ago
- Implementation of DropCov as described in DropCov: A Simple yet Effective Method for Improving Deep Architectures☆10Oct 15, 2022Updated 3 years ago
- Uncovering Selective State Space Model's Capabilities in Lifelong Sequential Recommendation☆34May 8, 2024Updated last year
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆32Mar 29, 2024Updated last year
- Implementation of the model from "Faster sorting algorithms discovered using deep reinforcement learning" that discovered an all-new ult…☆11Aug 29, 2023Updated 2 years ago
- PyTorch implementation for "Rethinking Low-quality Optical Flow in Unsupervised Surgical Instrument Segmentation"☆10Apr 11, 2024Updated last year
- Repository dedicated to developing a robust and modular framework for Multi-Agent Reinforcement Learning (MARL) algorithms.☆13Mar 3, 2024Updated last year
- ☆34Jun 2, 2023Updated 2 years ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆42Mar 11, 2025Updated 11 months ago
- underwater dataset, open-data☆11Aug 22, 2021Updated 4 years ago
- Progress Web App template for Scripture App Builder☆13Updated this week
- ☆13Mar 21, 2024Updated last year
- An Email Spam Classifier project, helps you detect your spam email from correct email. Try it out here!☆12Jun 16, 2023Updated 2 years ago
- ☆11Dec 24, 2019Updated 6 years ago
- Official repository for "Pre- to Post-Contrast Breast MRI Synthesis for Enhanced Tumour Segmentation"☆12Jan 31, 2024Updated 2 years ago
- Hierarchical Universal Modular ANotator☆11Feb 7, 2026Updated last week
- Official implementation of the paper "Light Transport-aware Diffusion Posterior Sampling for Single View Reconstruction of Volumes"☆17Aug 1, 2025Updated 6 months ago
- A future game about space. Currently state-of-the-art Bevy-Lunex UI implementation.☆12Sep 17, 2023Updated 2 years ago
- A pre-commit hook for Pyrefly.☆23Feb 10, 2026Updated last week
- The processingjs.org website☆29Jun 5, 2020Updated 5 years ago
- Code and software used to design de novo protein nanomachines. Supplementary material for "Computational design of nanoscale rotational m…☆10Mar 19, 2022Updated 3 years ago
- Retrieval Augmented Generation, but no servers involved. Backed by S3☆12Nov 3, 2023Updated 2 years ago