mazzzystar / TurtleBenchLinks
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles.
☆149Updated 8 months ago
Alternatives and similar repositories for TurtleBench
Users that are interested in TurtleBench are comparing it to the libraries listed below
Sorting:
- Deep Reasoning Translation (DRT) Project☆224Updated 3 weeks ago
- Qwen GRPO Graph Extraction RL Finetune☆49Updated 2 months ago
- 🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.☆251Updated 4 months ago
- ☆50Updated 2 months ago
- ☆418Updated last week
- Repo for NAACL 2025 Paper "Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization"☆278Updated 5 months ago
- Train a Language Model with GRPO to create a schedule from a list of events and priorities☆206Updated last month
- Multiple instructed-LLMs engage in multi-round "self-questioning" to seek the optimal solution, borrowing from the idea of debate, iterat…☆78Updated 10 months ago
- Evaluation for AI apps and agent☆42Updated last year
- A streamlined, user-friendly JSON streaming preprocessor, crafted in Python.☆102Updated 9 months ago
- Countdown Game Distill&RL☆47Updated 2 months ago
- An Awesome List of Reinforcement Learning-based Large Language Agent Works. Collect directly from official code base.☆154Updated this week
- Using APPL to reimplement popular algorithms for Large Language Models (LLMs) and prompts☆45Updated 5 months ago
- 利用免费的大模型api来结合你的私域数据来生成sft训练数据(妥妥白嫖)支持llamafactory等工具的训练数据格式synthetic data☆167Updated 7 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆216Updated 4 months ago
- ☆489Updated 4 months ago
- GLM Series Edge Models☆142Updated 2 weeks ago
- Conversational Retrieval Evaluation Dataset☆100Updated 3 months ago
- ☆224Updated last year
- A lightweight script for processing HTML page to markdown format with support for code blocks☆79Updated last year
- A Python Package to Access World-Class Generative Models☆127Updated last year
- Prompt 工程师利器,可同时比较多个 Prompts 在多个 LLM 模型上的效果☆95Updated last year
- GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation☆205Updated last week
- Build games with GPT☆313Updated 11 months ago
- 我们是第一个完全可商用的角色大模型。☆40Updated 10 months ago
- ☆109Updated 6 months ago
- 中文基于满血DeepSeek-R1蒸馏数据集☆56Updated 4 months ago
- Convert different model APIs into the OpenAI API format out of the box.☆153Updated last year
- The official repository of the dots.llm1 base and instruct models proposed by rednote-hilab.☆412Updated 2 weeks ago
- The official codes for "Aurora: Activating chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning"☆264Updated last year