Tencent-Hunyuan/C3-Benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Tencent-Hunyuan/C3-Benchmark)

Tencent-Hunyuan / C3-Benchmark

C^3-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking

☆38

Alternatives and similar repositories for C3-Benchmark

Users that are interested in C3-Benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thinkwee / DDR_Bench
View on GitHub
Deep Data Research. Seek More, See Beyond.
☆16Feb 6, 2026Updated 5 months ago
SWE-bench / reading-list
View on GitHub
Academic papers and works related to SWE-bench and SWE-agents
☆15Dec 8, 2025Updated 7 months ago
liushulinle / UloRL
View on GitHub
An Ultra-Long Output Reinforcement Learning Approach
☆23Jul 31, 2025Updated 11 months ago
kwaipilot / SWE-Compass
View on GitHub
☆18Mar 28, 2026Updated 3 months ago
yupeijei1997 / WildToolBench
View on GitHub
(ICLR 2026)Benchmarking LLM Tool-Use in the Wild
☆37Apr 5, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
reka-ai / research-eval
View on GitHub
A benchmark to evaluate search-augmented LLMs
☆17Aug 28, 2025Updated 10 months ago
Tencent-Hunyuan / Hunyuan-7B
View on GitHub
Tencent Hunyuan 7B (short as Hunyuan-7B) is one of the large language dense models of Tencent Hunyuan
☆70Aug 11, 2025Updated 11 months ago
wu-zhonghua / DAT
View on GitHub
☆18Oct 4, 2022Updated 3 years ago
nika2312 / qa_explaination
View on GitHub
☆13Jul 8, 2020Updated 6 years ago
ypw0102 / BatchEval
View on GitHub
code for ACL2024-main: BatchEval: Towards Human-like Text Evaluation
☆19May 20, 2024Updated 2 years ago
Lossfunk / KernelBench-v2
View on GitHub
KernelBench v2: Can LLMs Write GPU Kernels? - Benchmark with Torch -> Triton (and more!) problems
☆24Jul 4, 2025Updated last year
MoonshotAI / Kimi-Researcher
View on GitHub
☆80Jun 20, 2025Updated last year
sparkle-reasoning / sparkle
View on GitHub
[NeurIPS'25] Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
☆16Dec 12, 2025Updated 7 months ago
thinkwee / NOVER
View on GitHub
[EMNLP-2025] R1-Zero on ANY TASK
☆32Nov 9, 2025Updated 8 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Tencent-Hunyuan / Hunyuan-4B
View on GitHub
☆16Aug 5, 2025Updated 11 months ago
chawyehsu / wende
View on GitHub
🍎Wende Chinese QA system (experimental)
☆10Jun 1, 2021Updated 5 years ago
princeton-nlp / ELIZA-Transformer
View on GitHub
[NAACL 2025] Representing Rule-based Chatbots with Transformers
☆23Feb 9, 2025Updated last year
EleutherAI / deep-ignorance
View on GitHub
☆20Jan 7, 2026Updated 6 months ago
rohit18115 / ICASSP-2022-latex-template
View on GitHub
This is the latex template that should be used for the paper submission in ICASSP 2022
☆12Sep 14, 2021Updated 4 years ago
luongthecong123 / fp8-quant-matmul
View on GitHub
Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.
☆19Feb 9, 2026Updated 5 months ago
BKHMSI / mixture-of-cognitive-reasoners
View on GitHub
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
☆46Feb 7, 2026Updated 5 months ago
Jaimboh / Llamaberry-Chain-of-Thought-Reasoning-in-AI
View on GitHub
Implementation of a multi-turn Chain of Thought (CoT) reasoning system, powered by the Llama 3.1 70B model on Groq.
☆18Sep 22, 2024Updated last year
bytedance / FTRL
View on GitHub
[ACL 2026] Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
☆52Jul 10, 2026Updated 2 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
uq-project / UQ
View on GitHub
UQ: Assessing Language Models on Unsolved Questions
☆30Aug 26, 2025Updated 11 months ago
wwh0411 / FedMABench
View on GitHub
[EMNLP 2025 Main Oral] FedMABench: Benchmarking Mobile GUI Agents on Decentralized Heterogeneous User Data.
☆16Nov 11, 2025Updated 8 months ago
thinkwee / HiMe
View on GitHub
One-Stop Personal Health AI Agent "Say Hi to Healthy Me"
☆46Updated this week
Leolty / repobench
View on GitHub
✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024
☆214Aug 16, 2024Updated last year
SakanaAI / ab-mcts-arc2
View on GitHub
☆117Jun 30, 2025Updated last year
pixeli99 / MixLN
View on GitHub
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆30Jul 24, 2025Updated last year
chenchen0103 / ACEBench
View on GitHub
☆188Oct 29, 2025Updated 8 months ago
ridgesai / ridges-old
View on GitHub
☆12May 30, 2025Updated last year
Heidelberg-NLP / MHKA
View on GitHub
The corresponding code from our paper "Social Commonsense Reasoning with Multi-Head Knowledge Attention (EMNLP 2020)". Do not hesitate to…
☆11Jun 12, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
THUDM / SWE-Dev
View on GitHub
[ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.
☆64Jul 21, 2025Updated last year
cehinson / ERRANT_ZH
View on GitHub
☆15Jan 21, 2021Updated 5 years ago
OpenMOSS / Lorsa
View on GitHub
☆30Nov 9, 2025Updated 8 months ago
changzhisun / entrel-joint-mrt
View on GitHub
☆19Sep 11, 2018Updated 7 years ago
chenyaofo / CTNAS
View on GitHub
[CVPR 2021] Contrastive Neural Architecture Search with Neural Architecture Comparators
☆40Apr 11, 2022Updated 4 years ago
shangshang-wang / Resa
View on GitHub
Resa: Transparent Reasoning Models via SAEs
☆50Sep 23, 2025Updated 10 months ago
PRIME-RL / RL-Compositionality
View on GitHub
FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
☆68Jan 26, 2026Updated 6 months ago