princeton-nlp / PTPLinks

Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073

☆29

Alternatives and similar repositories for PTP

Users that are interested in PTP are comparing it to the libraries listed below

Sorting:

psunlpgroup / VisOnlyQA
This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…
☆24Updated 3 weeks ago
ernie-research / Tool-Augmented-Reward-Model
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆51Updated last month
dqxiu / KAssess
☆14Updated last year
wwxu21 / CUT
Source code of "Reasons to Reject? Aligning Language Models with Judgments"
☆58Updated last year
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
GAIR-NLP / weak-to-strong-reasoning
☆59Updated 11 months ago
GAIR-NLP / OlympicArena
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆102Updated 4 months ago
RUCAIBox / JiuZhang3.0
The code and data for the paper JiuZhang3.0
☆48Updated last year
RUCAIBox / BAMBOO
☆35Updated last year
GAIR-NLP / benbench
Benchmarking Benchmark Leakage in Large Language Models
☆55Updated last year
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆81Updated 11 months ago
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75Updated 2 months ago
hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆139Updated 10 months ago
CriticBench / CriticBench
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
☆27Updated last year
GAIR-NLP / MetaCritique
Evaluate the Quality of Critique
☆36Updated last year
RenzeLou / AAAR-1.0
The source code for running LLMs on the AAAR-1.0 benchmark.
☆17Updated 3 months ago
fangyuan-ksgk / CoT-Reasoning-without-Prompting
Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting
☆32Updated last year
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆63Updated 7 months ago
feyzaakyurek / rl4f
Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.
☆64Updated 8 months ago
RenzeLou / Muffin
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following
☆16Updated 9 months ago
YuxiXie / SelfEval-Guided-Decoding
☆99Updated last year
DAMO-NLP-SG / LongPO
[ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
☆38Updated 5 months ago
dvlab-research / Mr-Ben
This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"
☆50Updated 9 months ago
TobiasLee / VEC
Visual and Embodied Concepts evaluation benchmark
☆21Updated last year
declare-lab / LLM-PuzzleTest
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
☆99Updated 5 months ago
TianHongZXY / CoRe
[ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)
☆50Updated last year
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆162Updated 2 months ago
microsoft / LEMA
official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"
☆59Updated last year
VisualWebBench / VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆58Updated 9 months ago
LuLuLuyi / LongHeads
[EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor
☆29Updated last year