princeton-nlp / PTP
Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073
☆28Updated 8 months ago
Alternatives and similar repositories for PTP:
Users that are interested in PTP are comparing it to the libraries listed below
- ☆59Updated 6 months ago
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆22Updated 3 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆50Updated 5 months ago
- Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"☆24Updated 9 months ago
- The source code for running LLMs on the AAAR-1.0 benchmark.☆16Updated 3 weeks ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆79Updated 8 months ago
- The code and data for the paper JiuZhang3.0☆42Updated 9 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆45Updated 2 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆28Updated last year
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆31Updated 3 weeks ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆56Updated 5 months ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers" [NeurIPS D&B, 2024]☆55Updated 2 months ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆38Updated last year
- MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following☆15Updated 4 months ago
- Codebase for Instruction Following without Instruction Tuning☆33Updated 6 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆55Updated 3 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆84Updated last month
- Evaluate the Quality of Critique☆35Updated 9 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"☆109Updated last year
- ☆29Updated last year
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆35Updated 8 months ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆96Updated 2 weeks ago
- M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆55Updated 3 months ago
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆39Updated 3 months ago
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆23Updated 6 months ago
- DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems☆22Updated 5 months ago
- Official Code of IdealGPT☆34Updated last year
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆80Updated 6 months ago
- Source code for the paper "Prefix Language Models are Unified Modal Learners"☆43Updated last year