MBZUAI-LLM / web2code
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
☆78Updated 5 months ago
Alternatives and similar repositories for web2code:
Users that are interested in web2code are comparing it to the libraries listed below
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆99Updated last month
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆42Updated 9 months ago
- ☆73Updated last year
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆131Updated 5 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]☆214Updated 3 weeks ago
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆151Updated 3 weeks ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆62Updated this week
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆54Updated 5 months ago
- ☆169Updated 9 months ago
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆52Updated 5 months ago
- ☆83Updated last week
- ☆43Updated this week
- ☆49Updated last year
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆69Updated 7 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆221Updated last year
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆174Updated 6 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆35Updated 2 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆34Updated 9 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆52Updated this week
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆181Updated last week
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆120Updated 5 months ago
- ☆71Updated 3 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 5 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆97Updated 7 months ago
- ☆38Updated 3 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆42Updated last month
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆201Updated 3 months ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆129Updated 10 months ago
- Official repo for StableLLAVA☆95Updated last year
- ☆133Updated last year