NanshineLoong/Self-Evolving-Benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NanshineLoong/Self-Evolving-Benchmark)

NanshineLoong / Self-Evolving-Benchmark

A framework for evolving and testing question-answering datasets with various models.

☆26

Alternatives and similar repositories for Self-Evolving-Benchmark

Users that are interested in Self-Evolving-Benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

saferlhf-v / saferlhf-v
View on GitHub
☆23Jun 16, 2025Updated last year
ChengshuaiZhao0 / The-Wolf-Within
View on GitHub
☆13Jul 16, 2026Updated last week
TIGER-AI-Lab / PixelWorld
View on GitHub
The official code of "PixelWorld: Towards Perceiving Everything as Pixels" [TMLR25]
☆15Sep 12, 2025Updated 10 months ago
NUSTM / LLMs-Waver-In-Judgments
View on GitHub
☆12Sep 23, 2024Updated last year
simonw / webvid-datasette
View on GitHub
A Datasette instance for searching WebVid-10M
☆15Sep 30, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
NEUIR / LegalDelta
View on GitHub
[ICASSP '26] This is the code repo for our paper: LegalΔ: Enhancing Legal Reasoning in LLMs via Reinforcement Learning with Chain-of-Thou…
☆31Jul 1, 2026Updated 3 weeks ago
KbsdJames / omni-math-rule
View on GitHub
The rule-based evaluation subset and code implementation of Omni-MATH
☆28Dec 23, 2024Updated last year
yzhang1918 / cikm2022rudi
View on GitHub
Codes and data for CIKM 2022 paper "RuDi: Explaining Behavior Sequence Models by Automatic Statistics Generation and Rule Distillation"
☆12Aug 16, 2022Updated 3 years ago
lucidrains / learning-to-expire-pytorch
View on GitHub
An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain
☆34Oct 30, 2020Updated 5 years ago
wangyu-ovo / MML
View on GitHub
Code for the paper "Jailbreak Large Vision-Language Models Through Multi-Modal Linkage"
☆35Dec 6, 2024Updated last year
sijeh / Sticker820K
View on GitHub
☆11Jun 12, 2023Updated 3 years ago
didiforgithub / SwarmAgent
View on GitHub
🌟 SwarmAgent: A framework for simulating social group dynamics using multi-agent collaboration, aiding insights into collective behavior…
☆13Dec 5, 2023Updated 2 years ago
hobart07 / Step1X-Edit_train
View on GitHub
☆14May 20, 2025Updated last year
zjunlp / EasyDetect
View on GitHub
[ACL 2024] An Easy-to-use Hallucination Detection Framework for LLMs.
☆42Feb 25, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
LTT-O / 3D-face-reconstruction
View on GitHub
Something about 3D face reconstruction
☆19Mar 24, 2023Updated 3 years ago
jaypriyadarshi / BLEU-score
View on GitHub
To calculate the BLUE score
☆11Jun 7, 2016Updated 10 years ago
xingyaoww / mint-bench
View on GitHub
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆141Jun 4, 2024Updated 2 years ago
Carol-gutianle / Awesome-llm-unlearning
View on GitHub
☆13Jun 17, 2024Updated 2 years ago
SqueezeAILab / LLM2LLM
View on GitHub
[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
☆196Mar 25, 2024Updated 2 years ago
FudanDISC / weakly-supervised-mVLP
View on GitHub
Implementation of our ACL2023 paper: Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Langua…
☆19Jul 5, 2023Updated 3 years ago
leileqiTHU / Attacker
View on GitHub
The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1
☆13Apr 23, 2025Updated last year
linjh1118 / Awesome-MLLM-For-Games
View on GitHub
MLLM @ Game
☆17May 12, 2025Updated last year
Felixgithub2017 / CG-Eval
View on GitHub
Chinese Generation Evaluation
☆13Aug 14, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zhw12 / BERTRL
View on GitHub
☆65Jan 27, 2023Updated 3 years ago
sinkers-lan / HoloPredictPose
View on GitHub
本项目利用深度学习技术，实时检测人体3D姿态，并基于此预测未来人体动作。采用mmpose框架与多进程技术实现后端快速预测，利用混合现实Hololens2头戴显示器显示人物动作，做到实时抓取，实时预测，实时显示。
☆12Oct 30, 2023Updated 2 years ago
krystalan / RAGtrans
View on GitHub
[EMNLP 2025 Findings] Retrieval-Augmented Machine Translation with Unstructured Knowledge
☆15Sep 4, 2025Updated 10 months ago
lblankl / Short-RL
View on GitHub
Short RL
☆19Apr 16, 2026Updated 3 months ago
JoeYing1019 / UltraTool
View on GitHub
[ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
☆71Aug 5, 2025Updated 11 months ago
YanyuanSu / Resume-Corpus
View on GitHub
☆21Mar 29, 2020Updated 6 years ago
menik1126 / UNComp
View on GitHub
[EMNLP 2025🔥] UNComp: Can Matrix Entropy Uncover Sparsity? -- A Compressor Design from an Uncertainty-Aware Perspective
☆20Jan 7, 2026Updated 6 months ago
idstcv / InMaP
View on GitHub
PyTorch Implementation for InMaP
☆12Oct 28, 2023Updated 2 years ago
akshat57 / how-do-llms-use-their-depth
View on GitHub
☆19Nov 24, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
bertiev / SimpleSafetyTests
View on GitHub
☆19Mar 25, 2024Updated 2 years ago
facebookresearch / ToolVerifier
View on GitHub
This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.
☆23Mar 11, 2024Updated 2 years ago
Buildsoftwaresphere / Window
View on GitHub
window.hjSiteSettings = {"forms":[],"record":true,"polls":[],"r":1.0,"record_targeting_rules":[],"deferred_page_contents":[{"targeting":[…
☆16
juntang-zhuang / torch_ACA
View on GitHub
repo for paper: Adaptive Checkpoint Adjoint (ACA) method for gradient estimation in neural ODE
☆56Mar 13, 2021Updated 5 years ago
BRZ911 / ViTCoT
View on GitHub
[ACM MM 2025] ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
☆18Jul 15, 2025Updated last year
NVlabs / AL-SSL
View on GitHub
☆18Mar 19, 2023Updated 3 years ago
mathllm / MATH-V
View on GitHub
[NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.
☆140May 16, 2025Updated last year