Weixin-Liang / Mapping-the-Increasing-Use-of-LLMs-in-Scientific-Papers
☆38Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for Mapping-the-Increasing-Use-of-LLMs-in-Scientific-Papers
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆30Updated 8 months ago
- ☆20Updated last year
- Code/data for MARG (multi-agent review generation)☆33Updated last week
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…☆29Updated 8 months ago
- Project repository of the paper "Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning wi…☆27Updated 8 months ago
- ☆94Updated 6 months ago
- Code repository for the paper "Mission: Impossible Language Models."☆39Updated 10 months ago
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆26Updated 3 months ago
- Repository for the ACL 2024 conference website☆17Updated last month
- The Prism Alignment Project☆37Updated 7 months ago
- ☆28Updated last month
- Resources for cultural NLP research☆67Updated this week
- ☆21Updated 8 months ago
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆43Updated 3 months ago
- Data and code for the Corr2Cause paper (ICLR 2024)☆88Updated 7 months ago
- ☆31Updated last year
- Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"☆31Updated 3 months ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆30Updated 3 months ago
- ☆64Updated last month
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment☆38Updated last year
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆68Updated last year
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆55Updated last year
- ☆36Updated 3 months ago
- The official repo for DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph☆10Updated last month
- ☆17Updated 8 months ago
- The repository for the scripts and materials for the paper "Simulating Opinion Dynamics with Networks of LLM-based Agents"."☆19Updated 4 months ago
- 🌾 Universal, customizable and deployable fine-grained evaluation for text generation.☆22Updated last year
- ☆63Updated 7 months ago
- Noise-robust de-duplication at scale☆15Updated last year
- ☆28Updated last year