GX-XinGao / GRA
The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"
☆21Updated 3 weeks ago
Alternatives and similar repositories for GRA:
Users that are interested in GRA are comparing it to the libraries listed below
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆25Updated 6 months ago
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆25Updated last month
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆35Updated 2 months ago
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Updated 7 months ago
- MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion☆19Updated last month
- Large Language Models Can Self-Improve in Long-context Reasoning☆69Updated 5 months ago
- ☆24Updated 3 weeks ago
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆42Updated 5 months ago
- [arxiv: 2505.02156] Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents☆15Updated this week
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆21Updated 2 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆93Updated this week
- ☆40Updated this week
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆42Updated 9 months ago
- ☆16Updated 9 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆56Updated 6 months ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆28Updated 10 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆47Updated 4 months ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers" [NeurIPS D&B, 2024]☆57Updated 3 months ago
- ☆17Updated 4 months ago
- Exploration of automated dataset selection approaches at large scales.☆40Updated 2 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆62Updated 6 months ago
- ☆22Updated 4 months ago
- This repository introduce a comprehensive paper list, datasets, methods and tools for memory research.☆49Updated this week
- Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models☆35Updated 7 months ago
- The source code for running LLMs on the AAAR-1.0 benchmark.☆16Updated last month
- ☆37Updated 3 weeks ago
- Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning". (By Xinghao Chen)☆15Updated 2 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆48Updated 10 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 4 months ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆58Updated last year