qiancheng0 / ModelingAgentLinks
☆16Updated last month
Alternatives and similar repositories for ModelingAgent
Users that are interested in ModelingAgent are comparing it to the libraries listed below
Sorting:
- DataSciBench: An LLM Agent Benchmark for Data Science☆22Updated 5 months ago
- A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization☆13Updated 6 months ago
- A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models☆14Updated last month
- Official Code Release for "Training a Generally Curious Agent"☆28Updated 2 months ago
- [EMNLP 2024] RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning☆15Updated 2 months ago
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆41Updated last month
- The code for "Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling"☆11Updated 2 years ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆41Updated 5 months ago
- ☆13Updated 2 weeks ago
- ☆20Updated last year
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆28Updated 3 months ago
- Code for paper: "LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits"☆13Updated 9 months ago
- ☆19Updated 10 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆60Updated 5 months ago
- [ICLR 2025] Code for the paper "Implicit Search via Discrete Diffusion: A Study on Chess"☆29Updated 4 months ago
- ☆11Updated last year
- Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges☆18Updated 2 months ago
- ACL24☆10Updated last year
- ☆20Updated last year
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆27Updated 4 months ago
- This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.☆13Updated last year
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆27Updated last month
- ☆23Updated 3 months ago
- Our paper is titled "NUS-IDS at FinCausal 2021: Dependency Tree in Graph Neural Networks for better Cause-Effect Span Detection".☆13Updated 3 years ago
- CodeUltraFeedback: aligning large language models to coding preferences☆71Updated last year
- This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box☆14Updated 7 months ago
- Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"☆10Updated 6 months ago
- Official implementation of ICML 2025 paper "Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment" (https:…☆25Updated 2 months ago
- A comprehensive and efficient long-context model evaluation framework☆15Updated this week
- ☆50Updated last month