Quehry / HelloBench
HelloBench: evaluating long text generation capabilities of LLMs
☆29Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for HelloBench
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆85Updated last month
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆50Updated 6 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 10 months ago
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆66Updated 10 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆46Updated 3 weeks ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆102Updated 6 months ago
- Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs☆42Updated 4 months ago
- Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆22Updated last month
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆29Updated 2 weeks ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 9 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆56Updated 8 months ago
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆58Updated 7 months ago
- Official repository for paper "GTA: A Benchmark for General Tool Agents" (NeurIPS 2024 D&B Track)☆43Updated this week
- ☆29Updated this week
- The Official Code Repository for GUI-World.☆37Updated 3 months ago
- ☆62Updated last month
- Code and data releases for the paper -- DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory☆30Updated 2 weeks ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆37Updated 3 weeks ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆47Updated 2 weeks ago
- FuseAI Project☆76Updated 2 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆61Updated 3 weeks ago
- ☆21Updated last week
- A scalable automated alignment method for large language models. Resources for "Aligning Large Language Models via Self-Steering Optimiza…☆10Updated 2 weeks ago
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆37Updated last month
- Codebase for Instruction Following without Instruction Tuning☆30Updated last month
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆44Updated this week
- ☆15Updated 4 months ago
- ☆15Updated 3 months ago
- ☆49Updated 3 weeks ago
- Benchmarking Benchmark Leakage in Large Language Models☆44Updated 5 months ago