SALT-NLP / DARG
The official repo for DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
☆14Updated 3 months ago
Alternatives and similar repositories for DARG:
Users that are interested in DARG are comparing it to the libraries listed below
- Evaluate the Quality of Critique☆35Updated 7 months ago
- Code/data for MARG (multi-agent review generation)☆38Updated 2 months ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆35Updated last month
- ☆21Updated 2 weeks ago
- A curated list of papers on LLMs and agents for scientific research and development☆28Updated last month
- Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"☆23Updated 7 months ago
- ☆20Updated 8 months ago
- [EMNLP 2024] This is the code for our paper "BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers".☆21Updated 4 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆69Updated last month
- AbstainQA, ACL 2024☆25Updated 3 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated 11 months ago
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆68Updated last month
- Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs☆17Updated 3 months ago
- The benchmark proposed in paper: GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability☆19Updated 10 months ago
- The repository for ACL 2024 paper "TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models"☆27Updated 7 months ago
- ✨ Resolving Knowledge Conflicts in Large Language Models, COLM 2024☆15Updated 3 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆28Updated 6 months ago
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆37Updated 3 months ago
- The source code for running LLMs on the AAAR-1.0 benchmark.☆14Updated 2 weeks ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆26Updated 10 months ago
- Official codebase for permutation self-consistency.☆16Updated 11 months ago
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)☆76Updated 3 months ago
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆57Updated 6 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆31Updated 11 months ago
- Code for Benchmarking Language Model Agents for Data-Driven Science☆22Updated 3 months ago
- ☆14Updated 4 months ago
- This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…☆19Updated 7 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆57Updated 8 months ago
- ☆23Updated last month
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆55Updated last month