☆38Feb 16, 2024Updated 2 years ago
Alternatives and similar repositories for LLM-evaluation-datasets
Users that are interested in LLM-evaluation-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An exploration of Android App Functions☆17May 26, 2025Updated 11 months ago
- An Interactive Hex-Rays Microcode Explorer☆17Feb 8, 2024Updated 2 years ago
- Writeup and exploit for CVE-2025-22441: Privilege escalation from installed app to SystemUI process on Android due to pass of untrusted A…☆100Oct 8, 2025Updated 6 months ago
- ☆19Sep 7, 2025Updated 7 months ago
- Cross-Site Scripting (XSS) is a common vulnerability that allows attackers to inject malicious scripts into web pages viewed by users. In…☆11Sep 10, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Coverage gathering JVMTI agent for Android☆26Oct 11, 2023Updated 2 years ago
- ☆14May 7, 2024Updated last year
- Red Team AI Benchmark: Evaluating Uncensored LLMs for Offensive Security☆36Dec 25, 2025Updated 4 months ago
- MIT IEEE URTC 2024. GSET 2024. Repository for the "MBASED: Practical Simplifications of Mixed Boolean-Arithmetic Obfuscation". A Binary N…☆42Aug 8, 2025Updated 8 months ago
- Code for the paper "Watermarking Makes Language Models Radioactive"☆22Oct 25, 2024Updated last year
- Binary Ninja deobfuscation plugin☆21Jul 23, 2025Updated 9 months ago
- How a leaked JWT secret inside a JavaScript file led to full admin access — and why most devs still don't see it coming.☆15Jul 22, 2025Updated 9 months ago
- KeySentry – Find leaked API keys & secrets in any GitHub repo. No mercy.☆38Aug 17, 2025Updated 8 months ago
- The official repo for "OpenMoE 2: Sparse Diffusion Language Models".☆56Dec 28, 2025Updated 4 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- LogicBench is a natural language question-answering dataset consisting of 25 different reasoning patterns spanning over propositional, fi…☆38May 2, 2024Updated last year
- [NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents☆74Apr 24, 2026Updated last week
- frida脚本集合☆35Feb 6, 2026Updated 2 months ago
- ☆48Feb 10, 2025Updated last year
- ☆33Sep 13, 2024Updated last year
- 🥇 Amazon Nova AI Challenge Winner - ASTRA emerged victorious as the top attacking team in Amazon's global AI safety competition, defeati…☆70Updated this week
- Code for the benchmarking single-cell foundation models (scGPT, scBERT, and Geneformer) for cell-type annotation task using skewed single…☆15Dec 8, 2024Updated last year
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆46Jul 1, 2025Updated 10 months ago
- This is the official implementation of TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data☆13Jul 21, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- a secret detection tool☆40Mar 1, 2026Updated 2 months ago
- Extension for CoEdPilot☆21Feb 25, 2026Updated 2 months ago
- ios application class-dump use frida☆40Apr 28, 2023Updated 3 years ago
- 大创项目,层级注意力机器翻译☆17Apr 12, 2021Updated 5 years ago
- ☆13Jun 4, 2023Updated 2 years ago
- A polyglot static analysis engine for detecting vulnerabilities in scripting languages native extensions based on joern.☆21Sep 1, 2025Updated 8 months ago
- Chain of Images for Intuitively Reasoning☆10Nov 29, 2023Updated 2 years ago
- ☆30Aug 21, 2025Updated 8 months ago
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆91Nov 13, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Indirect Prompt Injection Methodology (IPIM) - A structured process which security professionals can use to find Indirect Prompt Injectio…☆21Jul 28, 2025Updated 9 months ago
- A data construction and evaluation framework to quantify privacy norm awareness of language models (LMs) and emerging privacy risk of LM …☆44Mar 4, 2025Updated last year
- This NDK can run on Linux / MacOS / Windows / FreeBSD / OpenBSD 7.3 / NetBSD for both the x86_64 and AARCH64 architectures.☆48Sep 29, 2025Updated 7 months ago
- ☆23Oct 14, 2024Updated last year
- AgenTracer: A Lightweight Failure Attributor for Agentic Systems☆90Nov 12, 2025Updated 5 months ago
- [COLING Demos 2025] an Easy-to-use Tool for Comprehensive Response Evaluation of LLMs☆38Mar 4, 2025Updated last year
- The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …☆11Sep 27, 2024Updated last year