declare-lab / LLM-PuzzleTest
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
☆44Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for LLM-PuzzleTest
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆85Updated last month
- The HELMET Benchmark☆67Updated last week
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆127Updated last month
- ☆89Updated 4 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆44Updated 5 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆67Updated last month
- A repository for research on medium sized language models.☆74Updated 5 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆46Updated 3 weeks ago
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)☆56Updated last month
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆30Updated 3 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆48Updated 7 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆102Updated 6 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆113Updated last week
- LL3M: Large Language and Multi-Modal Model in Jax☆64Updated 6 months ago
- Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"☆23Updated 5 months ago
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"☆63Updated 9 months ago
- The official repository of the Omni-MATH benchmark.☆47Updated last week
- Self-Alignment with Principle-Following Reward Models☆148Updated 8 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 9 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆50Updated 6 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆67Updated 5 months ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers"☆40Updated last month
- ☆34Updated 3 months ago
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆82Updated 8 months ago
- ☆61Updated 2 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆24Updated 6 months ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆26Updated 4 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 8 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆95Updated 2 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆61Updated 3 weeks ago