zjunlp / WorFBench
Benchmarking Agentic Workflow Generation
☆25Updated last week
Related projects ⓘ
Alternatives and complementary repositories for WorFBench
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆30Updated 9 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆38Updated last month
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆46Updated 3 weeks ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆48Updated 6 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆44Updated this week
- InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆51Updated 3 weeks ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆85Updated last month
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 9 months ago
- ☆29Updated this week
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆96Updated last week
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆127Updated last month
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆68Updated 3 weeks ago
- [ICML 2024 Oral] A framework for society simulation that supports complex simulation, for example: multi-scene.☆50Updated 3 months ago
- ☆41Updated 2 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆75Updated 3 weeks ago
- [ACL 2024] The project of Symbol-LLM☆41Updated 4 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆73Updated 3 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆67Updated last month
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆57Updated 7 months ago
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆29Updated 6 months ago
- Code implementation of synthetic continued pretraining☆54Updated last month
- Official repository for paper "GTA: A Benchmark for General Tool Agents" (NeurIPS 2024 D&B Track)☆43Updated this week
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆102Updated 6 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆38Updated 3 months ago
- ☆19Updated last month
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 8 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated 8 months ago
- ☆64Updated 7 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆48Updated 8 months ago
- ☆57Updated 2 weeks ago