YihongDong / CDD-TED4LLMsView external linksLinks
☆16Nov 26, 2024Updated last year
Alternatives and similar repositories for CDD-TED4LLMs
Users that are interested in CDD-TED4LLMs are comparing it to the libraries listed below
Sorting:
- This repository hosts the source code for the paper "ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Mo…☆16Dec 16, 2025Updated last month
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- ☆15Jun 18, 2024Updated last year
- 快来生成你的浏览记录年度总结!☆18Dec 12, 2024Updated last year
- The evaluation framework for the InfiCoder-Eval benchmark.☆21Jul 22, 2024Updated last year
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆30Mar 5, 2024Updated last year
- Releasing code for "ReCode: Robustness Evaluation of Code Generation Models"☆58Mar 20, 2024Updated last year
- The Paper List on Data Contamination for Large Language Models Evaluation.☆110Jan 29, 2026Updated 2 weeks ago
- ☆33Feb 2, 2026Updated last week
- ☆28Oct 28, 2023Updated 2 years ago
- Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors (ACL 2023)☆28Mar 26, 2024Updated last year
- List of research papers of ICSE, FSE, ASE, and ISSTA since 2020.☆33Oct 16, 2025Updated 3 months ago
- ☆26Jul 19, 2022Updated 3 years ago
- Text of BnF Ms Fr 640 in multiple formats, metadata about the manuscript, and derived data☆13Dec 22, 2025Updated last month
- ☆28Nov 29, 2022Updated 3 years ago
- A library for red-teaming LLM applications with LLMs.☆29Oct 11, 2024Updated last year
- exploring whether LLMs perform case-based or rule-based reasoning☆30Mar 2, 2024Updated last year
- Automated Benchmarking of LLM Agents on Real-World Software Security Tasks [NeurIPS 2025]☆55Jan 27, 2026Updated 2 weeks ago
- ☆30Dec 27, 2024Updated last year
- ☆39Feb 4, 2026Updated last week
- BeHonest: Benchmarking Honesty in Large Language Models☆34Aug 15, 2024Updated last year
- ☆12Nov 30, 2018Updated 7 years ago
- ☆16Jul 7, 2025Updated 7 months ago
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆36Jun 8, 2023Updated 2 years ago
- Baselines for all tasks from Long Code Arena benchmarks 🏟️☆39Mar 30, 2025Updated 10 months ago
- ☆44Jun 24, 2025Updated 7 months ago
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year
- ☆11May 6, 2021Updated 4 years ago
- The official repository for "Rongsheng Wang's Arxiv Template"☆55May 7, 2025Updated 9 months ago
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated 11 months ago
- 基于adaboost的SVM预测股票价格☆11Mar 4, 2018Updated 7 years ago
- ☆12Jan 11, 2026Updated last month
- Extracts static code features from opencl kernels to be used for machine learning.☆10Apr 30, 2021Updated 4 years ago
- ☆14Mar 5, 2024Updated last year
- ☆11Jul 14, 2024Updated last year
- ☆11Nov 5, 2024Updated last year
- ☆11Oct 15, 2022Updated 3 years ago