qiancheng0 / ModelingAgentLinks
☆18Updated last month
Alternatives and similar repositories for ModelingAgent
Users that are interested in ModelingAgent are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024] RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning☆14Updated 5 months ago
- AutoLibra: Metric Induction for Agents from Open-Ended Human Feedback☆15Updated 2 weeks ago
- The OlymMATH dataset☆20Updated 4 months ago
- FinanceRAG project by KAIST students. Advanced Retrieval-Augmented Generation (RAG) system designed for the financial domain.☆15Updated 8 months ago
- The official implementation of the paper "Large Scale Knowledge Washing"☆10Updated last year
- ☆20Updated last year
- DataSciBench: An LLM Agent Benchmark for Data Science☆35Updated last month
- Official Code Release for "Training a Generally Curious Agent"☆35Updated 5 months ago
- ☆20Updated last year
- This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box☆16Updated 10 months ago
- [ACL'25] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench☆33Updated 2 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆64Updated 8 months ago
- The official code for NAACL 2024 paper: $E^5$: Zero-shot Hierarchical Table Analysis using Augmented LLMs via Explain, Extract, Execute, …☆15Updated last year
- A symbolic benchmark for verifiable chain-of-thought financial reasoning. Includes executable templates, 58 topics across 12 domains, and…☆19Updated last week
- Official repository of paper "Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models"☆22Updated 5 months ago
- Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".☆27Updated last year
- Reinforced Multi-LLM Agents training☆56Updated 4 months ago
- ☆63Updated 4 months ago
- This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.☆13Updated last year
- QRHead: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking☆24Updated last month
- Code for paper "Prompt Engineering a Prompt Engineer" (https://arxiv.org/abs/2311.05661)☆10Updated last year
- OptiBench and ReSocratic Synthesis Method☆26Updated 3 weeks ago
- ☆12Updated 8 months ago
- A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization☆16Updated 10 months ago
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆62Updated 10 months ago
- Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"☆16Updated 5 months ago
- Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"☆115Updated 2 months ago
- ☆69Updated last month
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆108Updated 3 months ago
- MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)☆73Updated 2 months ago