leolee99 / Online-CNCLIP
ChineseCLIP using online learning
☆12Updated 2 years ago
Alternatives and similar repositories for Online-CNCLIP:
Users that are interested in Online-CNCLIP are comparing it to the libraries listed below
- A subset of YFCC100M. Tools, checking scripts and links of web drive to download datasets(uncompressed).☆18Updated 2 months ago
- ☆19Updated last year
- Benchmarking Attention Mechanism in Vision Transformers.☆17Updated 2 years ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆19Updated 4 months ago
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆18Updated last year
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Updated 9 months ago
- ☆87Updated last year
- Unofficial implement of "Pix2seq: A Language Modeling Framework for Object Detection" on mmdetection☆31Updated 2 years ago
- The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". Th…☆42Updated 2 months ago
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆31Updated last year
- ☆10Updated last year
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆45Updated 8 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆38Updated 3 months ago
- IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆26Updated last month
- [CVPR2022 Oral] The official code for "TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognit…☆18Updated 2 years ago
- [CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model☆17Updated 9 months ago
- Teach-DETR: Better Training DETR with Teachers☆30Updated 10 months ago
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation☆35Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆14Updated 3 months ago
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆16Updated last year
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆29Updated last year
- [NeurIPS 2023] Towards Free Data Selection with General-Purpose Models☆34Updated 9 months ago
- Turning to Video for Transcript Sorting☆48Updated last year
- Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training☆15Updated last year
- Official implementation of TagAlign☆34Updated last month
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆18Updated last month
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆21Updated 10 months ago
- Code of CropMix: Sampling a Rich Input Distribution via Multi-Scale Cropping☆17Updated 2 years ago
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆37Updated last year