JiauZhang / tracking-arxiv
微信公众号:机器感知 | Tracking the Latest Arxiv Papers
☆37Updated last year
Alternatives and similar repositories for tracking-arxiv:
Users that are interested in tracking-arxiv are comparing it to the libraries listed below
- A subset of YFCC100M. Tools, checking scripts and links of web drive to download datasets(uncompressed).☆19Updated 3 months ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆27Updated last year
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 7 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 6 months ago
- ☆52Updated last year
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆26Updated 7 months ago
- (ICLR 2024, CVPR 2024) SparseFormer☆70Updated 3 months ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆40Updated 10 months ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆29Updated 4 months ago
- [ICCV 2023] CLR: Channel-wise Lightweight Reprogramming for Continual Learning☆29Updated 8 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 7 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆32Updated 10 months ago
- LMM which strictly superset LLM embedded☆37Updated 3 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 7 months ago
- ☆34Updated last year
- ☆47Updated last year
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated 7 months ago
- Official implementation of TagAlign☆34Updated 2 months ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆44Updated 3 months ago
- ☆17Updated last year
- ☆93Updated 9 months ago
- A curated list of papers and resources for text-to-image evaluation.☆27Updated last year
- Masked Vision-Language Transformer in Fashion☆33Updated last year
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Updated last year
- ☆44Updated last year
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 4 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆82Updated last year
- Codebase for the paper-Elucidating the design space of language models for image generation☆45Updated 2 months ago
- GIFT: Generative Interpretable Fine-Tuning☆20Updated 4 months ago