Latest Evaluation Toolkit (LatestEval). Assessing the language models with latest, uncontaminated materials.
☆29Feb 17, 2025Updated last year
Alternatives and similar repositories for LatestEval
Users that are interested in LatestEval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Longitudinal Evaluation of LLMs via Data Compression☆33May 29, 2024Updated last year
- SSLCL: An Efficient Model-Agnostic Supervised Contrastive Learning Framework for Emotion Recognition in Conversations☆15Jul 27, 2024Updated last year
- The official repository for the paper entitled "Time Travel in LLMs: Tracing Data Contamination in Large Language Models."☆13Jun 11, 2024Updated last year
- Source code of paper “A Novel Three-Stage Learning Framework for Low-Resource Knowledge-Grounded Dialogue Generation”☆16Nov 25, 2021Updated 4 years ago
- Evaluating LLMs with Dynamic Data☆113Apr 20, 2026Updated 2 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆11Jul 11, 2023Updated 2 years ago
- code and dataset of EMNLP 2020 paper "PARADE: A New Dataset for Paraphrase Identification Requiring Computer Science Domain Knowledge"☆12Nov 6, 2020Updated 5 years ago
- (EACL 2021) Discourse-Aware Unsupervised Summarization of Long Scientific Documents☆25Jun 12, 2023Updated 2 years ago
- Mini Model Daemon☆13Nov 9, 2024Updated last year
- Website for release of TellMeWhy dataset for why question answering☆14Nov 11, 2022Updated 3 years ago
- Knowledge Infused Decoding☆70Dec 31, 2023Updated 2 years ago
- Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"☆33Mar 15, 2024Updated 2 years ago
- Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts☆25Feb 23, 2024Updated 2 years ago
- Official implementation of the ACL 2023 paper: "Zero-shot Faithful Factual Error Correction"☆17Aug 14, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Submissions, baselines and evaluations scripts for the 2nd version of the WebNLG+ Challenge 2020☆13Feb 1, 2022Updated 4 years ago
- The Paper List on Data Contamination for Large Language Models Evaluation.☆115Jan 29, 2026Updated 3 months ago
- ☆17Mar 20, 2025Updated last year
- test images with not appropriate labels in MNIST dataset☆10Mar 3, 2018Updated 8 years ago
- Experiments generating text with state-of-the-art deep-learning models (GPT2, Transformer XL, ...)☆12May 15, 2019Updated 6 years ago
- Chicago Social Interaction Model (chiSIM) framework repository☆12Aug 9, 2023Updated 2 years ago
- Chinese tokens in tiktoken tokenizers.☆33May 15, 2024Updated last year
- Explore what LLMs are really leanring over SFT☆28Mar 30, 2024Updated 2 years ago
- AIrmageddon is a home security AI Agent☆11Aug 30, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 🤖 Code for our EMNLP 2022 paper: "BotsTalk: Machine-sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Dataset…☆16Oct 7, 2024Updated last year
- RWKV-7 mini☆12Mar 29, 2025Updated last year
- Direct Preference Optimization for RWKV, aiming for RWKV-5 and 6.☆11Mar 1, 2024Updated 2 years ago
- Published version of composing programs textbook☆15Mar 8, 2014Updated 12 years ago
- PyTorch implementation of experiments in the paper Aligning Language Models with Human Preferences via a Bayesian Approach☆32Nov 6, 2023Updated 2 years ago
- ☆18Jun 9, 2025Updated 11 months ago
- GoldFinch and other hybrid transformer components☆13Dec 9, 2025Updated 5 months ago
- ☆27Dec 8, 2025Updated 5 months ago
- Do Large Language Models Know What They Don’t Know?☆103Nov 8, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Original PoC for CVE-2023-30367☆17Jan 4, 2024Updated 2 years ago
- DeFacto - Demonstrations and Feedback for improving factual consistency of text summarization☆30Dec 19, 2022Updated 3 years ago
- Mathematical Analysis (et analyse fonctionnelle)☆14Feb 1, 2022Updated 4 years ago
- ☆13Feb 26, 2023Updated 3 years ago
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- ☆13Dec 5, 2022Updated 3 years ago
- Synthetic data generation for TODs☆23Jul 17, 2024Updated last year