mazzzystar / TurtleBench
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles.
☆145Updated 6 months ago
Alternatives and similar repositories for TurtleBench:
Users that are interested in TurtleBench are comparing it to the libraries listed below
- Repo for NAACL 2025 Paper "Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization"☆268Updated 3 months ago
- Multiple instructed-LLMs engage in multi-round "self-questioning" to seek the optimal solution, borrowing from the idea of debate, iterat…☆76Updated 8 months ago
- ☆50Updated last week
- 🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.☆244Updated 2 months ago
- Deep Reasoning Translation via Reinforcement Learning (arXiv preprint 2025); DRT: Deep Reasoning Translation via Long Chain-of-Thought (a…☆214Updated last week
- 利用免费的大模型api来结合你的私域数据来生成sft训练数据(妥妥白嫖)支持llamafactory等工具的训练数据格式synthetic data☆154Updated 4 months ago
- 🌐 WebWalker: Benchmarking LLMs in Web Traversal☆382Updated last week
- ☆104Updated 4 months ago
- Qwen GRPO Graph Extraction RL Finetune☆45Updated 2 weeks ago
- ☆436Updated 2 months ago
- ☆221Updated last year
- Using APPL to reimplement popular algorithms for Large Language Models (LLMs) and prompts☆43Updated 3 months ago
- ☆221Updated 2 months ago
- Convert different model APIs into the OpenAI API format out of the box.☆150Updated last year
- 😜 表情包视觉数据集,使用glm-4v、step-1v的图像解析能力标注。☆119Updated 11 months ago
- The Level-Navi Agent, a framework that requires no training and utilizes large language models for deep query understanding and precise s…☆78Updated 3 months ago
- GLM Series Edge Models☆136Updated 2 months ago
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆153Updated last week
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆186Updated 2 months ago
- [ICLR 2025] The First Multimodal Seach Engine Pipeline and Benchmark for LMMs☆427Updated 2 months ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆244Updated last week
- A lightweight script for processing HTML page to markdown format with support for code blocks☆79Updated last year
- TEaR framework for paper "TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement"☆43Updated 8 months ago
- 顾名思义:手搓的RAG☆121Updated last year
- The official codes for "Aurora: Activating chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning"☆260Updated 11 months ago
- 🔥Your Daily Dose of AI Research from Hugging Face 🔥 Stay updated with the latest AI breakthroughs! This bot automatically collects and…☆50Updated this week
- XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced Retrieval-Augmented Generation☆92Updated 2 months ago
- ROGRAG: A Robustly Optimized GraphRAG Framework☆110Updated 2 weeks ago
- A LLM-based Agent that predict its tasks proactively.☆346Updated last month
- A Toolkit for Running On-device Large Language Models (LLMs) in APP☆72Updated 9 months ago