Using conversational games to evaluate powerful LLMs
☆18Sep 3, 2023Updated 2 years ago
Alternatives and similar repositories for GameEval
Users that are interested in GameEval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ACL2023, Findings] Source codes for the paper "Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduc…☆16Feb 22, 2025Updated last year
- OpenBA-V2: 3B LLM (Large Language Model) with T5 architecture, utilizing model pruning technique and continuing pretraining from OpenBA-1…☆25May 10, 2024Updated last year
- This is the codebase for pre-training, compressing, extending, and distilling LLMs with Megatron-LM.☆12Mar 11, 2024Updated 2 years ago
- Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)☆14Mar 6, 2025Updated last year
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICASSP 2025 Oral] The official implementation of paper "TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfe…☆16Mar 13, 2025Updated last year
- Text-based game of lies and deceit, made for language models.☆32Aug 25, 2023Updated 2 years ago
- This is the official Gtihub repo for our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Lang…☆22Jul 3, 2024Updated last year
- SpyGame: An interactive multi-agent framework to evaluate intelligence with large language models :D☆15Nov 9, 2023Updated 2 years ago
- Source code of paper “A Novel Three-Stage Learning Framework for Low-Resource Knowledge-Grounded Dialogue Generation”☆16Nov 25, 2021Updated 4 years ago
- [AAAI 2025] Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems☆13May 5, 2025Updated last year
- The code used to power DeepRole☆37Nov 21, 2022Updated 3 years ago
- [ACL 2024 Findings] Code implementation of Paper "Rethinking Negative Instances for Generative Named Entity Recognition"☆60Mar 20, 2024Updated 2 years ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 9 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆36Oct 14, 2022Updated 3 years ago
- Source code for paper Grammatical Error Correction in Low-Resource Scenarios (W-NUT 2019)☆13Jun 21, 2022Updated 3 years ago
- What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness☆26May 16, 2025Updated 11 months ago
- SGLang Kernel Wheel Index☆22Updated this week
- Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lo…☆16Nov 27, 2024Updated last year
- Repo for our AKBC-2021 paper: Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering☆10Oct 10, 2021Updated 4 years ago
- A minimum demo for PyTorch distributed extension functionality for collectives.☆15Jul 29, 2024Updated last year
- ☆41Jun 19, 2024Updated last year
- MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting☆20Jul 11, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs☆29Aug 15, 2025Updated 8 months ago
- Natural Language to Overpass Query Language☆30Mar 14, 2024Updated 2 years ago
- A record of reading list on some MLsys popular topic☆24Mar 20, 2025Updated last year
- Some C++/C/CUDA Extension☆16Feb 2, 2022Updated 4 years ago
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆54Oct 29, 2024Updated last year
- Code for Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack(TPAMI 2025)☆43Aug 28, 2025Updated 8 months ago
- A collection of instruction data and scripts for machine translation.☆20Sep 23, 2023Updated 2 years ago
- The aim of this repository is to utilize LLaMA to reproduce and enhance the Stanford Alpaca☆98Apr 5, 2023Updated 3 years ago
- Create a simple Audio Recorder in Flutter that records the phone's microphone and your voice.☆11Apr 11, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆110Jul 15, 2025Updated 9 months ago
- this is an implementation for the paper Improve Mathematical Reasoning in Language Models by Automated Process Supervision from google de…☆46Jul 8, 2025Updated 10 months ago
- [NeurIPS'25] Official Implementation of RISE (Reinforcing Reasoning with Self-Verification)☆32Aug 8, 2025Updated 9 months ago
- ☆27Feb 13, 2026Updated 2 months ago
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆18Nov 19, 2025Updated 5 months ago
- Collection of latest papers and materials in the area of RLVR!☆101Updated this week
- Code and data for the paper: Competing Large Language Models in Multi-Agent Gaming Environments☆97Jan 26, 2026Updated 3 months ago