google-deepmind / neptune
☆42Updated 5 months ago
Alternatives and similar repositories for neptune:
Users that are interested in neptune are comparing it to the libraries listed below
- Code for CVPR 2023 paper "Procedure-Aware Pretraining for Instructional Video Understanding"☆48Updated 3 weeks ago
- ☆83Updated last year
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models".☆101Updated 8 months ago
- Language Repository for Long Video Understanding☆31Updated 8 months ago
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"☆92Updated 3 months ago
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆45Updated last year
- ☆67Updated 2 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆59Updated 7 months ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆61Updated 4 months ago
- Holistic evaluation of multimodal foundation models☆42Updated 6 months ago
- ☆41Updated last year
- ☆53Updated 2 months ago
- ☆64Updated last year
- ☆89Updated last year
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)☆14Updated last year
- Video-LlaVA fine-tune for CinePile evaluation☆46Updated 6 months ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆89Updated 2 months ago
- M4 experiment logbook☆56Updated last year
- ☆159Updated 4 months ago
- ☆68Updated 4 months ago
- [ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…☆65Updated 5 months ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆51Updated last year
- Recursive Visual Programming (ECCV 2024)☆17Updated 2 months ago
- In-the-wild Question Answering☆15Updated last year
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆19Updated 3 weeks ago
- Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scra…☆53Updated last year
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆189Updated last month
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆16Updated 3 months ago
- Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"☆93Updated 6 months ago
- [NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆29Updated 2 months ago