meituan-longcat/Meeseeks

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/meituan-longcat/Meeseeks)

meituan-longcat / Meeseeks

A iterative feedback driven benchmark on LLM's instruction following ability

☆58

Alternatives and similar repositories for Meeseeks

Users that are interested in Meeseeks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AGI-Eval-Official / CoreCodeBench
View on GitHub
☆16Nov 20, 2025Updated 8 months ago
NJUNLP / Hallu-PI
View on GitHub
The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …
☆11Sep 27, 2024Updated last year
Junjie-Ye / MulDimIF
View on GitHub
[ACL 2026] A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
☆23Jul 10, 2026Updated last week
THU-KEG / VerIF
View on GitHub
[EMNLP 2025] Verification Engineering for RL in Instruction Following
☆57Mar 30, 2026Updated 3 months ago
meituan-longcat / R-HORIZON
View on GitHub
[ICLR'26] R-HORIZON: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
☆27May 9, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
meituan-longcat / LongCat-Flash-Thinking
View on GitHub
☆287May 13, 2026Updated 2 months ago
yuleiqin / RAIF
View on GitHub
A Recipe for Building LLM Reasoners to Solve Complex Instructions
☆32Oct 9, 2025Updated 9 months ago
sauradip / MUPPET
View on GitHub
[ Arxiv 2023 ] This repository contains the code for "MUPPET: Multi-Modal Few-Shot Temporal Action Detection"
☆16Aug 30, 2023Updated 2 years ago
meituan-longcat / LongCat-Flash-Chat
View on GitHub
☆1,352Jun 23, 2026Updated 3 weeks ago
Rainier-rq / verl-if
View on GitHub
Official implementation of the paper "Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following"
☆40Jan 11, 2026Updated 6 months ago
sastpg / CoVo
View on GitHub
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
☆25Jun 25, 2025Updated last year
taishan1994 / pytorch_unbalanced_text_classification
View on GitHub
基于pytorch的不平衡数据的文本分类
☆12Dec 26, 2021Updated 4 years ago
THU-KEG / AgentIF
View on GitHub
[NIPS 2025 DB Spotlight] AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
☆39Dec 1, 2025Updated 7 months ago
lichengliu03 / unary-feedback
View on GitHub
☆44Mar 31, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
thu-coai / CROPI
View on GitHub
[ACL'26] Official Repository for for paper "Data-Efficient RLVR via Off-Policy Influence Guidance"
☆24Mar 29, 2026Updated 3 months ago
VIM-Bench / VIM_TOOL
View on GitHub
☆12Jun 12, 2024Updated 2 years ago
zthang / Focus
View on GitHub
☆24Feb 3, 2024Updated 2 years ago
Strong-AI-Lab / ChatLogic
View on GitHub
☆16Dec 17, 2023Updated 2 years ago
zhchen18 / ToMBench
View on GitHub
ToMBench: Benchmarking Theory of Mind in Large Language Models, ACL 2024.
☆68Jun 24, 2024Updated 2 years ago
xyjigsaw / Linux-Knowledge-Graph
View on GitHub
Knowledge Graph for Linux in Triples and Neo4j
☆13Aug 22, 2020Updated 5 years ago
Tongyi-CCAI / Complex-IF
View on GitHub
☆34Jan 26, 2026Updated 5 months ago
meituan-longcat / vitabench
View on GitHub
[ICLR 2026] VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
☆157Feb 22, 2026Updated 4 months ago
primepake / dac_vae
View on GitHub
Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder
☆38Aug 30, 2025Updated 10 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
SJTU-lqiu / QA4IE
View on GitHub
Original implementation of QA4IE
☆25Jul 28, 2021Updated 4 years ago
lean-dojo / lean4code
View on GitHub
Lean4 Code Editor
☆17Jul 14, 2026Updated last week
ntunlp / ptrnet-depparser
View on GitHub
☆11Oct 13, 2019Updated 6 years ago
pku-sixing / IJCAI2020-TopicKA
View on GitHub
Resources for our IJCAI 2020 paper, TopicKA: Generating Commonsense Knowledge-Aware Dialogue Responses Towards the Recommended Topic Fact
☆12Nov 30, 2020Updated 5 years ago
facebookresearch / Multi-IF
View on GitHub
The evaluation code for MultiIF multi-turn and multi-lingual instruction following
☆63Oct 29, 2024Updated last year
wzhouad / WPO
View on GitHub
Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"
☆41Sep 24, 2024Updated last year
Tele-AI / TeleChat2.5
View on GitHub
☆30Jul 25, 2025Updated 11 months ago
varshakishore / IncDSI
View on GitHub
☆11Sep 10, 2023Updated 2 years ago
leileqiTHU / Attacker
View on GitHub
The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1
☆13Apr 23, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
xydaytoy / EVA
View on GitHub
☆14Apr 16, 2024Updated 2 years ago
yunan4nlp / NNDisParser
View on GitHub
☆10Aug 30, 2022Updated 3 years ago
vl-rewardbench / VL_RewardBench
View on GitHub
☆29Jul 23, 2025Updated 11 months ago
THUSE-Course / course-index
View on GitHub
☆11Mar 3, 2026Updated 4 months ago
AI0Research / MRDL-and-MRDR
View on GitHub
☆10Apr 5, 2025Updated last year
NJU-LINK / IF-VidCap
View on GitHub
The Source Code for IF-VidCap @ICLR 2026
☆19Oct 22, 2025Updated 8 months ago
malihealikhani / CITE
View on GitHub
CITE: A Corpus of Image-Text Discourse Relations
☆13Apr 7, 2019Updated 7 years ago