luohongyin / LangCode
LangCode - Improving alignment and reasoning of large language models (LLMs) with natural language embedded program (NLEP).
☆38Updated 11 months ago
Related projects: ⓘ
- ☆42Updated 2 months ago
- Repository for paper Tools Are Instrumental for Language Agents in Complex Environments☆32Updated 8 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆26Updated 11 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 8 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆84Updated 11 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆27Updated 2 months ago
- Codebase accompanying the Summary of a Haystack paper.☆65Updated 2 months ago
- Advanced Reasoning Benchmark Dataset for LLMs☆45Updated 10 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆81Updated last month
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆73Updated 2 months ago
- ☆17Updated 6 months ago
- Based on the tree of thoughts paper☆45Updated last year
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty☆63Updated 5 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆39Updated 7 months ago
- [SIGIR 2024 (Demo)] CoSearchAgent: A Lightweight Collborative Search Agent with Large Language Models☆22Updated 7 months ago
- ☆34Updated last month
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆68Updated last year
- A Retrieval Benchmark for Scientific Literature Search☆53Updated 2 months ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆28Updated last month
- ☆27Updated 5 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆39Updated 3 weeks ago
- A set of utilities for running few-shot prompting experiments on large-language models☆106Updated 10 months ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆42Updated 8 months ago
- Official implementation for <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>, accepted by ACL 2024.☆32Updated 3 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆76Updated 6 months ago
- Functional Benchmarks and the Reasoning Gap☆74Updated last month
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆30Updated last month
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆45Updated 6 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆107Updated 2 weeks ago
- ☆16Updated 6 months ago