大语言模型评估平台,支持多种评估基准、自定义数据集和性能测试。支持基于自定义数据集的RAG评估。
☆87Aug 20, 2025Updated 8 months ago
Alternatives and similar repositories for llm-eval
Users that are interested in llm-eval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.☆2,741Updated this week
- A performance load tests platform base python3+vue3+locust+grafana,cool and user-friendly(性能测试平台)☆13Apr 22, 2024Updated 2 years ago
- fufan-chat-api的前端项目☆28Nov 1, 2024Updated last year
- 灵芝IAST是一款交互式应用安全评估工具,覆盖了Java WEB相关安全风险的检测,具有近实时检测、准确率高、误报率低、漏洞链路清晰等特点|使用之前请阅读官方文档☆16Jul 18, 2020Updated 5 years ago
- 等保测评文档☆13Dec 18, 2018Updated 7 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [WACV 2025] Official Implementation of LIME: Localized Image Editing via Attention Regularization in Diffusion Models☆10Apr 7, 2025Updated last year
- A code security platform based on fortify sca windows☆15Mar 6, 2019Updated 7 years ago
- This is the official implementation of ICLR 2024 paper "VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimod…☆19Feb 24, 2025Updated last year
- A practical utility library for LangChain and LangGraph development☆105Mar 4, 2026Updated 2 months ago
- 灵猫智能管理平台是一个在线web测试项目与测试工具管理平台,通过灵猫智能快速敏捷的灵活性,实现项目管理、用例管理、模块管理、UI自动化测试管理、小工具应用等等系统的测试☆11Jun 21, 2021Updated 4 years ago
- 大模型API企业网关,公司内部API管理,分发聚和系统,支持将多种大模型转换成统一的OpenAI兼容接口,尤其对国内开源模型deepseek,qwen,kimi,glm提供特别支持 可供个人或者企业内部大模型API统一管理和渠道分发使用(key管理与二次分发),长期更新,支…☆39Sep 12, 2025Updated 7 months ago
- Web Based Iperf Result Real-time Visualization☆15Apr 26, 2019Updated 7 years ago
- This project is a deliberately vulnerable environment to learn about LLM-specific risks based on the OWASP Top 10 for LLM Applications.☆51Jan 19, 2026Updated 3 months ago
- 本文提出了一个基于“文心一言”的中国LLMs的安全评估基准,其中包括8种典型的安全场景和6种指令攻击类型。此外,本文还提出了安全评估的框架和过程,利用手动编写和收集开源数据的测试Prompts,以及人工干预结合利用LLM强大的评估能力作为“共同评估者”。☆34Sep 1, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆17May 31, 2023Updated 2 years ago
- 基于Jmeter实现的在线压测平台,在原有版本基础上进行一些个性化的功能添加;本系统在zyanycall/stressTestPlatform的开源项目基础上开发;☆15Dec 17, 2021Updated 4 years ago
- ☆13Feb 15, 2023Updated 3 years ago
- ☆22Sep 16, 2022Updated 3 years ago
- 微信开源威胁情报机器人☆12Mar 13, 2023Updated 3 years ago
- This repository to demonstrate an application built with Java 21 + SrpingBoot 3 + MyBatis including CRUD operations, authentication, rout…☆12Dec 1, 2024Updated last year
- Benchmarking Physical Risk Awareness of Foundation Model-based Embodied AI Agents☆23Nov 28, 2024Updated last year
- Code for the paper "Abstractive Summarization Guided by Latent Hierarchical Document Structure"☆13May 20, 2023Updated 2 years ago
- Code for paper 'Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse'☆13Aug 2, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- LLM 推理服务性能测试☆44Dec 17, 2023Updated 2 years ago
- ☆11Aug 21, 2023Updated 2 years ago
- A very simple chat application using Spring Boot, Vue.js (in TypeScript), gRPC, gRPC-Web and EnvoyProxy.☆10May 20, 2019Updated 6 years ago
- 基于Doc2vec和Word2vec的句子对匹配方法☆23Jun 3, 2017Updated 8 years ago
- 景区综合管理平台 ----echats 和 大屏 的完美结合 ,大屏宽度(百分比)高度(rem)自适应☆11Apr 27, 2018Updated 8 years ago
- MobileSAM のエンコーダー/デコーダーをONNXに変換し、推論するサンプル☆12Apr 11, 2024Updated 2 years ago
- To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models☆33May 21, 2025Updated 11 months ago
- Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"☆16Aug 11, 2023Updated 2 years ago
- Codes简单易用的一站式研发管理平台 :免费使用 、本地安装、研发管理、测试管理、数字大屏、CI CD、接口测试、缺陷管理、DevTestOps☆29Jun 19, 2023Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆12May 22, 2018Updated 7 years ago
- Claude Agent SDK UI☆49Jan 13, 2026Updated 3 months ago
- experimental H5P content for automated feedback on texts☆17Updated this week
- ☆22Dec 7, 2021Updated 4 years ago
- ☆24Nov 23, 2021Updated 4 years ago
- ☆10Jul 18, 2024Updated last year
- aigc evals☆10Dec 2, 2023Updated 2 years ago