3B-Group / ConvReLinks
π€ConvReπ€―: An Investigation of LLMsβ Inefficacy in Understanding Converse Relations (EMNLP 2023)
β23Updated last year
Alternatives and similar repositories for ConvRe
Users that are interested in ConvRe are comparing it to the libraries listed below
Sorting:
- The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".β65Updated 2 years ago
- Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"β22Updated 4 months ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinismβ30Updated last year
- Provides a minimal implementation to extract FLAN datasets for further processingβ11Updated 2 years ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoningβ27Updated last year
- β14Updated last year
- [ICML'24] TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasksβ30Updated 10 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.β62Updated last year
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"β108Updated 2 years ago
- β99Updated last year
- A Portal Site for Structured Knowledge Grounding(SKG) Resources.β9Updated 2 years ago
- β14Updated 3 weeks ago
- A zero-shot neural semantic parser without using annotated parallel training data.β8Updated 3 years ago
- [EMNLP 2022] TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Dataβ17Updated 2 years ago
- β54Updated last year
- Analyzing LLM Alignment via Token distribution shiftβ16Updated last year
- Code for reproducing the ACL'23 paper: Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environmentsβ74Updated 2 months ago
- Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planningβ36Updated last year
- β41Updated last year
- A unified benchmark for math reasoningβ88Updated 2 years ago
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant seβ¦β60Updated 2 years ago
- Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"β77Updated 2 years ago
- [ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorialsβ36Updated 5 months ago
- Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.β131Updated 2 years ago
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWWβ¦β129Updated 2 years ago
- [NAACL 2024] A Synthetic, Scalable and Systematic Evaluation Suite for Large Language Modelsβ32Updated last year
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspectiveβ33Updated last year
- Visual and Embodied Concepts evaluation benchmarkβ21Updated last year
- β28Updated last year
- Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.β64Updated 8 months ago