JiauZhang / tracking-arxiv
微信公众号:机器感知 | Tracking the Latest Arxiv Papers
☆37Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for tracking-arxiv
- ☆35Updated 4 months ago
- An interactive demo based on Segment-Anything for stroke-based painting which enables human-like painting.☆34Updated last year
- The official implementation of the paper "Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation"☆16Updated 4 months ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆28Updated last month
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆29Updated 4 months ago
- ☆52Updated last year
- [CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model☆16Updated 6 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆18Updated this week
- LMM which strictly superset LLM embedded☆31Updated last week
- TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆15Updated this week
- ☆19Updated last year
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆27Updated 4 months ago
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Updated 7 months ago
- [ICCV 2023] CLR: Channel-wise Lightweight Reprogramming for Continual Learning☆29Updated 5 months ago
- (ECCV 2024) Can OOD Object Detectors Learn from Foundation Models?☆17Updated 2 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 4 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 7 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆38Updated 4 months ago
- Awesome-DragGAN: A curated list of papers, tutorials, repositories related to DragGAN☆82Updated last year
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆23Updated 9 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆16Updated 3 weeks ago
- GIFT: Generative Interpretable Fine-Tuning☆18Updated last month
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆50Updated last month
- Masked Vision-Language Transformer in Fashion☆33Updated last year
- Moved to https://github.com/NUS-HPC-AI-Lab/InfoBatch☆6Updated 9 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆37Updated 6 months ago
- i-mae Pytorch Repo☆19Updated 7 months ago
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆38Updated 3 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆26Updated last month