The evaluation code for MultiIF multi-turn and multi-lingual instruction following
☆63Oct 29, 2024Updated last year
Alternatives and similar repositories for Multi-IF
Users that are interested in Multi-IF are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆32Oct 9, 2025Updated 8 months ago
- CFBench: A Comprehensive Constraints-Following Benchmark for LLMs☆54Aug 26, 2024Updated last year
- Evaluating Reward Models in Multilingual Settings (ACL Main '25)☆42May 16, 2025Updated last year
- ☆27Jun 2, 2026Updated 3 weeks ago
- Code for AAAI 2023 research track paper "Question Decomposition Tree for Answering Complex Questions over Knowledge Bases"☆17Jan 3, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models☆118Jun 12, 2025Updated last year
- CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings☆75Feb 3, 2025Updated last year
- MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency☆32Sep 11, 2023Updated 2 years ago
- Pythonic wrappers for Cider/CiderD evaluation metrics. Provides CIDEr as well as CIDEr-D (CIDEr Defended) which is more robust to gaming …☆13Dec 4, 2025Updated 6 months ago
- Evaluate the Quality of Critique☆37Jun 1, 2024Updated 2 years ago
- The official repository of the Omni-MATH benchmark.☆94Dec 22, 2024Updated last year
- The code implementation of the EMNLP2022 paper: DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Gene…☆27Nov 13, 2023Updated 2 years ago
- Modified CartPole-v0 OpenAI Gym environment with various noisy cases and Reinforcement Learning based controller☆10Dec 5, 2017Updated 8 years ago
- Code and data for NAACL 2025 paper "IHEval: Evaluating Language Models on Following the Instruction Hierarchy"☆17Feb 25, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆14Aug 15, 2024Updated last year
- This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language …☆199Apr 29, 2026Updated 2 months ago
- This repo explores how AMR to address tasks difficult for LLMs☆13Jan 15, 2024Updated 2 years ago
- ☆33Aug 30, 2023Updated 2 years ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆54Jun 24, 2024Updated 2 years ago
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF☆25Oct 8, 2024Updated last year
- ☆89Dec 29, 2023Updated 2 years ago
- ☆89Feb 5, 2025Updated last year
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆414Jun 25, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks☆63Aug 2, 2023Updated 2 years ago
- High accuracy captcha solver for SJTU Jaccount login page using SVM and ResNet.☆14Nov 9, 2022Updated 3 years ago
- [CIKM 2025] Constraint Back-translation Improves Complex Instruction Following of Large Language Models☆19May 23, 2025Updated last year
- 基于 Go 的 HTTP 中继工具,为你的服务器请求 OpenAI 的 API 提供中继服务,也可用于搭建镜像站,开箱即用. Golang based HTTP relay server.☆12Apr 19, 2023Updated 3 years ago
- Repository containing the website for the EMNLP 2023 conference☆17Feb 12, 2025Updated last year
- Data and codes for EMNLP 2022 paper "CDConv: A Benchmark for Contradiction Detection in Chinese Conversations"☆13May 8, 2023Updated 3 years ago
- 🦫 BEAVER: An Enterprise Benchmark for Text-to-SQL (BEAVER-MAY-2025)☆54May 20, 2026Updated last month
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆49Nov 29, 2024Updated last year
- Code and Data for EMNLP 2023 Paper "MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Langu…☆14Apr 7, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1☆13Apr 23, 2025Updated last year
- ☆10Feb 12, 2024Updated 2 years ago
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆97Aug 15, 2023Updated 2 years ago
- The official implementation of "Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding"☆22Jun 26, 2025Updated last year
- [COLM'25] Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?☆39Jun 5, 2025Updated last year
- Caffe/Neon prototxt training file for our Neurocomputing2017 work: Fuzzy Quantitative Deep Compression Network☆11May 30, 2018Updated 8 years ago
- Normalized Wasserstein for Mixture Distributions☆11Mar 24, 2023Updated 3 years ago