☆16Nov 26, 2024Updated last year
Alternatives and similar repositories for CDD-TED4LLMs
Users that are interested in CDD-TED4LLMs are comparing it to the libraries listed below
Sorting:
- This repository hosts the source code for the paper "ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Mo…☆16Dec 16, 2025Updated 2 months ago
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆30Mar 5, 2024Updated 2 years ago
- Releasing code for "ReCode: Robustness Evaluation of Code Generation Models"☆58Mar 20, 2024Updated last year
- The Paper List on Data Contamination for Large Language Models Evaluation.☆110Jan 29, 2026Updated last month
- ☆33Updated this week
- ☆28Oct 28, 2023Updated 2 years ago
- Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors (ACL 2023)☆28Mar 26, 2024Updated last year
- ☆26Jul 19, 2022Updated 3 years ago
- exploring whether LLMs perform case-based or rule-based reasoning☆30Mar 2, 2024Updated 2 years ago
- A library for red-teaming LLM applications with LLMs.☆29Oct 11, 2024Updated last year
- ☆30Dec 27, 2024Updated last year
- ☆39Feb 25, 2026Updated last week
- BeHonest: Benchmarking Honesty in Large Language Models☆34Aug 15, 2024Updated last year
- ☆12Nov 30, 2018Updated 7 years ago
- ☆16Jul 7, 2025Updated 7 months ago
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆36Jun 8, 2023Updated 2 years ago
- ☆44Jun 24, 2025Updated 8 months ago
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- ☆12Jan 11, 2026Updated last month
- 基于adaboost的SVM预测股票价格☆11Mar 4, 2018Updated 8 years ago
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated last year
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- ☆11May 6, 2021Updated 4 years ago
- The official repository for "Rongsheng Wang's Arxiv Template"☆55May 7, 2025Updated 9 months ago
- Extracts static code features from opencl kernels to be used for machine learning.☆10Apr 30, 2021Updated 4 years ago
- Dataset for AAAI paper "Natural Language Inference in Context - Investigating Contextual Reasoning over Long Texts"☆11Nov 18, 2022Updated 3 years ago
- 中文金融大模型测评基准,六大类二十五任务、等级化评价,国内模型获得A级☆10May 6, 2024Updated last year
- ☆12Aug 9, 2023Updated 2 years ago
- [ICSE 2023] Differentiable interpretation and failure-inducing input generation for neural network numerical bugs.☆13Jan 5, 2024Updated 2 years ago
- Survey of available speech datasets for Polish ASR development☆17Jan 1, 2025Updated last year
- The official Python library for Openlayer, the Continuous Model Improvement Platform for AI. 📈☆16Feb 25, 2026Updated last week
- ☆12Nov 5, 2024Updated last year
- LLM benchmarks☆13Feb 22, 2024Updated 2 years ago
- The official repository for the paper entitled "Time Travel in LLMs: Tracing Data Contamination in Large Language Models."☆12Jun 11, 2024Updated last year
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- [ICLR 2025 SCI-FM Workshop] Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging☆13Mar 27, 2025Updated 11 months ago
- [NeurIPS 2024] Efficiency for Free: Ideal Data Are Transportable Representations☆19Jan 19, 2025Updated last year