cubenlp / BIBench
BIBench:数据分析领域LLM评测基准
☆14Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for BIBench
- ☆129Updated 4 months ago
- ☆91Updated 11 months ago
- ☆125Updated last year
- SuperCLUE-Agent: 基于中文原生任务的Agent智能体核心能力测评基准☆78Updated last year
- ☆120Updated 7 months ago
- MEASURING MASSIVE MULTITASK CHINESE UNDERSTANDING☆87Updated 7 months ago
- ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆62Updated 7 months ago
- ☆93Updated 8 months ago
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆132Updated 4 months ago
- An open-source conversational language model developed by the Knowledge Works Research Laboratory at Fudan University.☆62Updated last year
- ☆78Updated 7 months ago
- ☆157Updated last year
- ☆54Updated last month
- FinEval是一个中文金融领域高质量多项选择与文本问答题的集合。☆161Updated last month
- ☆61Updated this week
- 中文大语言模型评测第二期☆70Updated last year
- 国内首个全参数训练的法律大模型 HanFei-1.0 (韩非)☆99Updated last year
- Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in datase…☆50Updated last year
- TianGong-AI-Unstructure☆51Updated this week
- Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)☆44Updated 7 months ago
- A Toolkit for Table-based Question Answering☆105Updated last year
- [ICLR24] The open-source repo of THU-KEG's KoLA benchmark.☆50Updated last year
- CodeGPT: A Code-Related Dialogue Dataset Generated by GPT and for GPT☆110Updated last year
- 中文原生检索增强生成测评基准☆100Updated 7 months ago
- [EMNLP 2023 Demo] CLEVA: Chinese Language Models EVAluation Platform☆57Updated 11 months ago
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"☆77Updated last year
- Unleashing the Power of Cognitive Dynamics on Large Language Models☆60Updated last month
- The official implementation of paper "ToolGen: Unified Tool Retrieval and Calling via Generation"☆99Updated 3 weeks ago
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆38Updated 8 months ago
- Leveraging large language models for text-to-SQL synthesis, this project fine-tunes WizardLM/WizardCoder-15B-V1.0 with QLoRA on a custom …☆43Updated 11 months ago