SALT-NLP / DARG
The official repo for DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
☆10Updated last month
Related projects ⓘ
Alternatives and complementary repositories for DARG
- Code/data for MARG (multi-agent review generation)☆33Updated last week
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆55Updated 4 months ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆30Updated 3 months ago
- Repository for paper Tools Are Instrumental for Language Agents in Complex Environments☆32Updated last month
- SRTK: Retrieve semantic-relevant subgraphs from large-scale knowledge graphs☆25Updated last month
- A pipeline using LLMs for Knowledge Engineering, combining knowledge probing and Wikidata entity mapping.☆33Updated last year
- Evaluate the Quality of Critique☆35Updated 5 months ago
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆36Updated 3 weeks ago
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment☆38Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆49Updated 8 months ago
- The benchmark proposed in paper: GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability☆16Updated 8 months ago
- AbstainQA, ACL 2024☆19Updated last month
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆57Updated 3 weeks ago
- ☆17Updated 8 months ago
- ☆22Updated 2 weeks ago
- Code for our paper "Graph Language Models"☆59Updated 2 months ago
- Official codebase for permutation self-consistency.☆16Updated 9 months ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆28Updated 5 months ago
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆33Updated last year
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆30Updated 9 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆61Updated 4 months ago
- ☆29Updated 9 months ago
- Discovering Data-driven Hypotheses in the Wild☆41Updated this week
- ☆36Updated 3 months ago
- ☆18Updated 5 months ago
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…☆29Updated 8 months ago
- Benchmarking Chat Assistants on Long-Term Interactive Memory☆21Updated this week
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆30Updated 6 months ago
- Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking☆16Updated last year
- Supporting code for ReCEval paper☆26Updated 2 months ago