akaihaoshuai / crawler_paperLinks
从ICCV等网页上爬取论文列表,并获取ArXiv的相关资料
☆14Updated 2 years ago
Alternatives and similar repositories for crawler_paper
Users that are interested in crawler_paper are comparing it to the libraries listed below
Sorting:
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆72Updated last year
- Chinese Vision-Language Understanding Evaluation☆24Updated 9 months ago
- ☆18Updated last year
- Parameter-Efficient Fine-Tuning for Foundation Models☆93Updated 6 months ago
- breezedeus的各种分享☆22Updated 2 years ago
- ☆92Updated 3 weeks ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆98Updated 4 months ago
- Code for our Paper "All in an Aggregated Image for In-Image Learning"☆29Updated last year
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆57Updated 5 months ago
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆88Updated last year
- A collection of strong multimodal models for building multimodal AGI agents☆43Updated last year
- Reading list for Multimodal Large Language Models☆68Updated 2 years ago
- (ACM MM24) This is the offical repository of GIST: Improving Parameter Efficient Fine Tuning via Knowledge Interaction.☆11Updated last year
- ☆50Updated 7 months ago
- [ACL 2024] ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.☆130Updated last year
- Offical Repository of "AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning"☆56Updated 2 months ago
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆53Updated last year
- ☆31Updated last year
- code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》☆35Updated last year
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆100Updated 3 weeks ago
- ☆36Updated last year
- ☆57Updated last year
- ☆48Updated last year
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆42Updated last year
- All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)☆165Updated last year
- [Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations☆150Updated last year
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆137Updated last year
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆103Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligence☆48Updated last year
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆57Updated last month