video description generation vision-language model
☆21Jan 21, 2025Updated last year
Alternatives and similar repositories for SpaceTimeGPT
Users that are interested in SpaceTimeGPT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An official codebase for "NormLens: Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Comm…☆10May 9, 2024Updated 2 years ago
- ForeHOI: Feed-forward 3D Object Reconstruction from Daily Hand-Object Interaction Videos☆46Mar 6, 2026Updated 2 months ago
- [ECCV 2024] LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation☆15Dec 23, 2024Updated last year
- This is the official resources for ECCV 2022 paper "Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation From M…☆18Jun 15, 2023Updated 2 years ago
- ImageNet3D: Towards General-Purpose Object-Level 3D Understanding☆21Dec 6, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- GoTrack: Generic 6DoF Object Pose Refinement and Tracking, CV4MR 2025☆84Oct 14, 2025Updated 6 months ago
- Swift framework to interact with Python.☆18Dec 24, 2020Updated 5 years ago
- Project for SNARE benchmark☆11Jun 5, 2024Updated last year
- ☆15Aug 28, 2024Updated last year
- [CVPR'25] UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image☆41May 29, 2025Updated 11 months ago
- Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"☆19Jan 18, 2026Updated 3 months ago
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Oct 11, 2025Updated 6 months ago
- Self-Supervised Multi-Scale Transformer with Attention-Guided Fusion for Efficient Crack Detection☆27Jan 17, 2026Updated 3 months ago
- ☆15Aug 5, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Fork of Open3D for iOS apps☆25Feb 6, 2022Updated 4 years ago
- ☆46May 24, 2025Updated 11 months ago
- Codebase for ACL 2023 paper "Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models' Memori…☆52Oct 8, 2023Updated 2 years ago
- [ICCV 2025] Object-centric Video Question Answering with Visual Grounding and Referring☆25Aug 8, 2025Updated 9 months ago
- Code for paper "W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering"☆15Oct 2, 2025Updated 7 months ago
- Code for MME-SID accepted to CIKM 2025 Full Research track.☆29Oct 29, 2025Updated 6 months ago
- RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery☆30Apr 16, 2024Updated 2 years ago
- ☆11Oct 25, 2020Updated 5 years ago
- ☆10Jun 19, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆16Oct 9, 2024Updated last year
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆23Aug 1, 2025Updated 9 months ago
- ☆25May 23, 2025Updated 11 months ago
- ☆15Aug 13, 2024Updated last year
- ☆31Oct 6, 2025Updated 7 months ago
- [EMNLP 2024 Industry track] MERLIN : Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank P…☆14Mar 4, 2025Updated last year
- ☆13Jul 23, 2024Updated last year
- code for downloading videos from HowTo100M dataset☆17May 13, 2021Updated 4 years ago
- ☆12Apr 6, 2021Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The correct way to resize images or tensors. For Numpy or Pytorch (differentiable).☆18May 5, 2022Updated 4 years ago
- ☆22Oct 28, 2024Updated last year
- Doing style transfer with linguistic features using OpenAI's CLIP.☆14May 4, 2021Updated 5 years ago
- quagga☆10Apr 7, 2020Updated 6 years ago
- [EMNLP 2024] TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering☆18Oct 31, 2024Updated last year
- Code for the paper: "Invertible CNN-Based Super Resolution with Downsampling Awareness" by Andrew Geiss and Joseph C. Hardin, Nov 2020☆12Nov 11, 2020Updated 5 years ago
- Hybrid Deep Sequential Modeling for Social Text-Driven Stock Prediction-Dataset☆22Aug 19, 2018Updated 7 years ago