zjunlp / WorFBench
Benchmarking Agentic Workflow Generation
☆29Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for WorFBench
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆30Updated 9 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆39Updated last month
- An Easy-to-use Hallucination Detection Framework for LLMs.☆48Updated 7 months ago
- ☆42Updated 2 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆47Updated last month
- [ICML 2024 Oral] A framework for society simulation that supports complex simulation, for example: multi-scene.☆52Updated 3 months ago
- Codebase for Instruction Following without Instruction Tuning☆32Updated last month
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆99Updated 3 weeks ago
- Code and data for the benchmark "Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Lan…☆34Updated 4 months ago
- ☆62Updated 3 weeks ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆75Updated last month
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 9 months ago
- InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆54Updated last week
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.…☆37Updated 4 months ago
- MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆17Updated 2 weeks ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆76Updated 9 months ago
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆46Updated 2 weeks ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers"☆40Updated last month
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆86Updated last month
- Repository for paper Tools Are Instrumental for Language Agents in Complex Environments☆32Updated last month
- official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"☆54Updated 11 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 10 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆75Updated last month
- Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"☆36Updated last week
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆63Updated last month
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆127Updated 2 months ago
- Code implementation of synthetic continued pretraining☆60Updated last month
- ☆56Updated 9 months ago
- Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment☆65Updated last year
- ☆22Updated 2 months ago