modelscope / evalscope
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
☆876Updated this week
Alternatives and similar repositories for evalscope:
Users that are interested in evalscope are comparing it to the libraries listed below
- Community maintained hardware plugin for vLLM on Ascend☆569Updated this week
- ☆884Updated last month
- 通义千问VLLM推理部署DEMO☆571Updated last year
- Train a 1B LLM with 1T tokens from scratch by personal☆627Updated last week
- Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.☆831Updated this week
- 开源SFT数据集整理,随时补充☆509Updated last year
- ☆323Updated 10 months ago
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL☆2,258Updated this week
- 大模型多维度中文对齐评测基准 (ACL 2024)☆382Updated 8 months ago
- The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.☆1,033Updated this week
- Netease Youdao's open-source embedding and reranker models for RAG products.☆1,722Updated 3 months ago
- Easy-to-Use RAG Framework; CCF AIOps International Challenge 2024 Top3 Solution; CCF AIOps 国际挑战赛 2024 季军方案☆428Updated 5 months ago
- 本项目旨在收集开源的表格智能任务数据集(比如表格问答、表格-文本生成等),将原始数据整理为指令微调格式的数据并微调LLM,进而增强LLM对于表格数据的理解,最终构建出专门面向表格智能任务的大型语言模型。☆568Updated last year
- minimal-cost for training 0.5B R1-Zero☆714Updated last week
- Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型,支持接入langchain加载本地知识库做检索增强生成RAG。Training your own Phi2 small chat model from scratch.☆550Updated 9 months ago
- 这是一个从头训练大语言模型的项目,包括预训练、微调和直接偏好优化,模型拥有1B参数,支持中英文。☆379Updated 2 months ago
- CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models☆302Updated 6 months ago
- This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.☆426Updated this week
- unified embedding model☆854Updated last year
- CMMLU: Measuring massive multitask language understanding in Chinese☆755Updated 5 months ago
- Distributed RL System for LLM Reasoning☆1,205Updated last week
- ☆679Updated 3 weeks ago
- FlagEval is an evaluation toolkit for AI large foundation models.☆335Updated last week
- Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs…☆522Updated 6 months ago
- High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.☆1,110Updated this week
- LongBench v2 and LongBench (ACL 2024)☆861Updated 3 months ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆715Updated 3 months ago
- ☆937Updated 2 months ago
- 本项目是针对RAG中的Retrieve阶段的召回技术及算法效果所做评估实验。使用主体框架为LlamaIndex.☆247Updated 4 months ago
- LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案☆526Updated last year