openmedlab / PULSE-EVALView external linksLinks
PULSE-EVAL
☆23Jan 12, 2024Updated 2 years ago
Alternatives and similar repositories for PULSE-EVAL
Users that are interested in PULSE-EVAL are comparing it to the libraries listed below
Sorting:
- 一个用于训练句子embedding的工具,支持Cosent以及Simcse、infonce☆21Jun 17, 2025Updated 7 months ago
- ☆28Aug 2, 2023Updated 2 years ago
- PULSE: Pretrained and Unified Language Service Engine☆494Dec 26, 2023Updated 2 years ago
- Counting-Stars (★)☆83Nov 24, 2025Updated 2 months ago
- ☆11Nov 12, 2024Updated last year
- The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …☆11Sep 27, 2024Updated last year
- We systematically studied the influencing factors when LLM generates benchmarks,By using our code, you can generate high-quality QA datas…☆20May 20, 2025Updated 8 months ago
- ☆16Aug 23, 2023Updated 2 years ago
- Code for ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context☆18Nov 15, 2024Updated last year
- [ACL 2025 Main] Open-source toolkit for automatic evaluation of text-to-image generation task, including training & test datasets and a d…☆16Jul 5, 2025Updated 7 months ago
- Cog wrapper for playgroundai/playground-v2.5-1024px-aesthetic☆17Nov 25, 2024Updated last year
- Instruction Following Eval☆15Jan 16, 2025Updated last year
- Benchmarking LLM Inference Speeds☆13Feb 4, 2026Updated last week
- get the media stream from Dahua/Haikang IPC SDK, and demux the stream to vedio and audio ES☆12Nov 15, 2015Updated 10 years ago
- CMB, A Comprehensive Medical Benchmark in Chinese☆230Mar 27, 2025Updated 10 months ago
- I don't want to maintain this project, the code probably won't compile or run. Archived.☆13Feb 25, 2024Updated last year
- 中文原生多层次文生视频测评基准☆18Jul 8, 2024Updated last year
- SRS is an industrial-strength live cluster, with simple code and best conceptual integrity.☆11Nov 14, 2021Updated 4 years ago
- The free energy principle☆17Feb 16, 2025Updated 11 months ago
- Pan-Tumor Radiology Foundation Model Utilizing Synthetic Training Data for Advanced Oncological Insights☆86Jan 6, 2026Updated last month
- Joint learning of object and action detectors☆15Nov 5, 2019Updated 6 years ago
- ☆18Nov 30, 2025Updated 2 months ago
- 这是一个基于OpenCompass的模型评测系统,该系统提供了前端页面UI以方便用户自助开展评测工作。☆24Aug 25, 2025Updated 5 months ago
- ☆13Oct 28, 2023Updated 2 years ago
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆24Nov 29, 2024Updated last year
- The official repository of the paper "The Digital Cybersecurity Expert: How Far Have We Come?" presented in IEEE S&P 2025☆24May 21, 2025Updated 8 months ago
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- “悟道”源代码☆21Aug 24, 2021Updated 4 years ago
- This project is used to realize sequence tagging by CRF+BiLSTM model.☆18Nov 8, 2019Updated 6 years ago
- Source code of ACL2022 "Headed-Span-Based Projective Dependency Parsing" and "Combining (second-order) graph-based and headed-span-based …☆16Jan 12, 2023Updated 3 years ago
- The source code for ACL 2021 paper☆19Oct 9, 2021Updated 4 years ago
- [KDD2024 ADS Track] RareBench: Can LLMs Serve as Rare Diseases Specialists?☆32Nov 28, 2025Updated 2 months ago
- Dataset and Code for ACL 2023 paper: "IM-TQA: A Chinese Table Question Answering Dataset with Implicit and Multi-type Table Structures". …☆26Aug 6, 2024Updated last year
- Source code of COLING 2022 paper "A Contrastive Cross-channel Data Augmentation Framework for Aspect-based Sentiment Analysis"☆22Feb 18, 2023Updated 2 years ago
- Task Complexity Classifier using Transformer-based NLP model based on Bloom's Taxonomy☆35Aug 18, 2025Updated 5 months ago
- Reproducible Language Agent Research☆33Jun 25, 2025Updated 7 months ago
- 本文提出了一个基于“文心一言”的中国LLMs的安全评估基准,其中包括8种典型的安全场景和6种指令攻击类型。此外,本文还提出了安全评估的框架和过程,利用手动编写和收集开源数据的测试Prompts,以及人工干预结合利用LLM强大的评估能力作为“共同评估者”。☆32Sep 1, 2023Updated 2 years ago
- ☆27Jun 26, 2017Updated 8 years ago
- ☆31Feb 9, 2025Updated last year