InternLM / AlchemistCoder
☆35Updated 5 months ago
Alternatives and similar repositories for AlchemistCoder:
Users that are interested in AlchemistCoder are comparing it to the libraries listed below
- ☆73Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 8 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆95Updated 2 weeks ago
- Official repository of MMDU dataset☆86Updated 5 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆74Updated 4 months ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆65Updated 2 weeks ago
- ☆28Updated 6 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆24Updated 2 months ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆98Updated 2 months ago
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆44Updated 9 months ago
- A Self-Training Framework for Vision-Language Reasoning☆69Updated last month
- ☆28Updated 5 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆30Updated 3 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆82Updated last year
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆59Updated 4 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆62Updated 3 months ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆114Updated 3 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 5 months ago
- ☆60Updated last year
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 5 months ago
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆39Updated 3 months ago
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆93Updated 4 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆103Updated 7 months ago
- A Survey on Benchmarks of Multimodal Large Language Models☆91Updated 2 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆48Updated 8 months ago
- A simple reproducible template to implement AI research papers☆23Updated 6 months ago
- ☆133Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆64Updated 6 months ago
- ☆66Updated 2 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated 2 weeks ago