ekwinox117/multi-challenge

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ekwinox117/multi-challenge)

ekwinox117 / multi-challenge

☆91

Alternatives and similar repositories for multi-challenge

Users that are interested in multi-challenge are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

vectara / FaithJudge
View on GitHub
☆18Nov 11, 2025Updated 8 months ago
esteng / regal_program_learning
View on GitHub
☆27Sep 11, 2024Updated last year
benpry / chain-of-thought-metaphor
View on GitHub
This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…
☆14Apr 28, 2023Updated 3 years ago
PKU-Baichuan-MLSystemLab / CFBench
View on GitHub
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
☆55Aug 26, 2024Updated last year
NY1024 / RACE
View on GitHub
☆27Mar 17, 2025Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
allenai / IFBench
View on GitHub
☆160May 13, 2026Updated 2 months ago
martin-wey / CodeUltraFeedback
View on GitHub
CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)
☆76Jun 25, 2024Updated 2 years ago
KwanWaiChung / MT-Eval
View on GitHub
Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"
☆57Nov 18, 2025Updated 8 months ago
kohjingyu / multi-agent-computer-use
View on GitHub
Code for the multi-agent computer use project.
☆19Jul 3, 2026Updated 2 weeks ago
mtbench101 / mt-bench-101
View on GitHub
[ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
☆152Jul 24, 2024Updated last year
microsoft / lost_in_conversation
View on GitHub
Code that accompanies the public release of the paper Lost in Conversation (https://arxiv.org/abs/2505.06120)
☆292Jun 9, 2026Updated last month
THU-KEG / KoLA
View on GitHub
[ICLR24] The open-source repo of THU-KEG's KoLA benchmark.
☆57Sep 28, 2023Updated 2 years ago
Xt-cyh / CoDI-Eval
View on GitHub
☆22May 7, 2025Updated last year
BytedTsinghua-SIA / Enigmata
View on GitHub
Resources for the Enigmata Project.
☆82Aug 13, 2025Updated 11 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
renll / SparseLT
View on GitHub
[EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing
☆14Feb 10, 2023Updated 3 years ago
janphilippfranken / sami
View on GitHub
Self-Supervised Alignment with Mutual Information
☆20May 24, 2024Updated 2 years ago
lainisourgod / sirius-supply-chain
View on GitHub
funny lab, absolute ai security maxxxing
☆18May 25, 2026Updated last month
SalesforceAIResearch / FoFo
View on GitHub
☆27Jun 2, 2026Updated last month
nickjw0205 / Improving-ASR-with-LLM-Description
View on GitHub
☆20Sep 2, 2024Updated last year
yubol-bobo / Awesome-Multi-Turn-LLMs
View on GitHub
This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language …
☆199Jul 11, 2026Updated last week
PrekshaNema25 / StructuredData_To_Descriptions
View on GitHub
☆17Oct 5, 2018Updated 7 years ago
chenllliang / ATP-AMR
View on GitHub
Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022
☆15Mar 31, 2023Updated 3 years ago
ZQS1943 / GLEN
View on GitHub
code for "GLEN: General-Purpose Event Detection for Thousands of Types"
☆13Nov 6, 2023Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
thu-coai / ComplexBench
View on GitHub
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
☆102Feb 20, 2025Updated last year
satrams / rent-rl
View on GitHub
RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.
☆42Oct 31, 2025Updated 8 months ago
aorwall / moatless-testbeds
View on GitHub
Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…
☆14Apr 9, 2025Updated last year
violet-zct / swarm-distillation-zero-shot
View on GitHub
☆23Oct 15, 2022Updated 3 years ago
Tongyi-CCAI / Complex-IF
View on GitHub
☆34Jan 26, 2026Updated 5 months ago
PKU-Alignment / beavertails
View on GitHub
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
☆182Oct 27, 2023Updated 2 years ago
allenai / olmes
View on GitHub
Reproducible, flexible LLM evaluations
☆388Mar 24, 2026Updated 3 months ago
NP-NET-research / wdel
View on GitHub
WDEL是一个基于Wikidata知识库的实体链接系统。
☆11Feb 12, 2025Updated last year
jszheng21 / RACE
View on GitHub
RACE is a multi-dimensional benchmark for code generation that focuses on Readability, mAintainability, Correctness, and Efficiency.
☆14Oct 12, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
freesunshine0316 / RaST-plus
View on GitHub
☆21Nov 14, 2022Updated 3 years ago
vicksEmmanuel / latent-gemma
View on GitHub
☆27Jan 14, 2025Updated last year
SIMONLQY / RethinkMCTS
View on GitHub
☆34Oct 2, 2024Updated last year
CriticBench / CriticBench
View on GitHub
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
☆31Mar 5, 2024Updated 2 years ago
xingyaoww / mint-bench
View on GitHub
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆140Jun 4, 2024Updated 2 years ago
tongmeihan1995 / DocEE
View on GitHub
DocEE: A Large-Scale and Fine-grained Benchmark for Document-level Event Extraction
☆42Apr 19, 2023Updated 3 years ago
TIGER-AI-Lab / CritiqueFineTuning
View on GitHub
Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]
☆182Jul 8, 2025Updated last year