zjykzj / MPDataset
Custom Iterable Dataset Class for Large-Scale Data Loading
☆13Updated 3 years ago
Alternatives and similar repositories for MPDataset:
Users that are interested in MPDataset are comparing it to the libraries listed below
- ICCV DeeperAction Challenge - Kinetics-TPS Challenge on Part-level Action Parsing and Action Recognition.☆15Updated 3 years ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆26Updated last year
- Video action classification benchmark for common CNN architectures, implemented in PyTorch☆11Updated 3 years ago
- ☆10Updated 5 months ago
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…☆25Updated last year
- ☆17Updated last year
- ☆34Updated 3 years ago
- A deep learning library to enable rapid prototyping☆36Updated last year
- Code for recreating the HoS benchmark of VISOR☆21Updated last year
- ICME2022 Special Session “Beyond Accuracy: Responsible, Responsive, and Robust Multimedia Retrieval ”☆12Updated 10 months ago
- A Simple Framwork for CV Pre-training Model (SOCO, VirTex, BEiT)☆15Updated 3 years ago
- ☆27Updated last month
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 8 months ago
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆18Updated 2 years ago
- Official implementation of AdaMML. https://arxiv.org/abs/2105.05165.☆51Updated 3 years ago
- Market-1501 dataset with super-resolution quality☆19Updated 2 years ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 10 months ago
- Code for Cross-dataset Training☆15Updated 4 years ago
- Gpu 任务排队☆2Updated last year
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆38Updated 10 months ago
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆43Updated 4 months ago
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆30Updated last year
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 7 months ago
- A practice for million-scale multi-domain universal object detection☆27Updated 10 months ago
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Updated last year
- Research code for "Training Vision-Language Transformers from Captions Alone"☆34Updated 2 years ago
- Pytorch 1.0 codes(including cuda codes) for Deformable Convolution Version 2☆18Updated 6 years ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆16Updated 2 months ago
- ☆61Updated last year
- ☆13Updated 2 years ago