A framework for evolving and testing question-answering datasets with various models.
☆23Feb 28, 2024Updated 2 years ago
Alternatives and similar repositories for Self-Evolving-Benchmark
Users that are interested in Self-Evolving-Benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12Jul 16, 2025Updated 9 months ago
- ☆12Sep 23, 2024Updated last year
- A Datasette instance for searching WebVid-10M☆15Sep 30, 2022Updated 3 years ago
- chinese ner based on rnn☆12Oct 14, 2016Updated 9 years ago
- Introduction about AWESOME_ENTROPY+LRM_PAPERS☆30Dec 16, 2025Updated 4 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆10Jun 12, 2023Updated 2 years ago
- An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain☆34Oct 30, 2020Updated 5 years ago
- ☆34Jun 28, 2025Updated 9 months ago
- Codes and data for CIKM 2022 paper "RuDi: Explaining Behavior Sequence Models by Automatic Statistics Generation and Rule Distillation"☆12Aug 16, 2022Updated 3 years ago
- The source code of Paper "PathQG: Neural Question Generation from Facts".☆23Jan 4, 2021Updated 5 years ago
- ☆30Jan 11, 2026Updated 3 months ago
- MLLM @ Game☆16May 12, 2025Updated 11 months ago
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆26Mar 4, 2025Updated last year
- [ACL 2024] An Easy-to-use Hallucination Detection Framework for LLMs.☆39Feb 25, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Something about 3D face reconstruction☆19Mar 24, 2023Updated 3 years ago
- To calculate the BLUE score☆11Jun 7, 2016Updated 9 years ago
- ☆13Jun 17, 2024Updated last year
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆195Mar 25, 2024Updated 2 years ago
- ☆17Mar 22, 2024Updated 2 years ago
- The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1☆13Apr 23, 2025Updated 11 months ago
- Chinese Generation Evaluation☆13Aug 14, 2023Updated 2 years ago
- A collection of important papers on Generalizable Diffusion-generated Image Detection☆17Mar 20, 2025Updated last year
- [ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios☆71Aug 5, 2025Updated 8 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- This repository contains the official implementation of the paper: "EHRStruct: A Comprehensive Benchmark Framework for Evaluating Large L…☆72Dec 18, 2025Updated 4 months ago
- An asymmetric 1v1 multiplayer game using Unreal Engine☆18Feb 25, 2017Updated 9 years ago
- Code for ICCV 2023 work "Generalized Few-Shot Point Cloud Segmentation Via Geometric Words"☆14Sep 26, 2023Updated 2 years ago
- Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models☆46Sep 19, 2025Updated 7 months ago
- insight data engineering fellow project☆16Nov 14, 2016Updated 9 years ago
- This is the official project of paper: Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conver…☆25Nov 18, 2024Updated last year
- (wip) Use LAION-AI's CLIP "conditoned prior" to generate CLIP image embeds from CLIP text embeds.☆29Jul 14, 2022Updated 3 years ago
- [IEEE S&P 22] "LinkTeller: Recovering Private Edges from Graph Neural Networks via Influence Analysis" by Fan Wu, Yunhui Long, Ce Zhang, …☆23Sep 7, 2021Updated 4 years ago
- This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.☆22Mar 11, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- The multi-view version of MonoDETR on nuScenes dataset☆21Nov 4, 2022Updated 3 years ago
- window.hjSiteSettings = {"forms":[],"record":true,"polls":[],"r":1.0,"record_targeting_rules":[],"deferred_page_contents":[{"targeting":[…☆16Updated this week
- CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models☆16Oct 14, 2024Updated last year
- [ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents☆17Oct 12, 2024Updated last year
- A guide for PhD students by a not so good PhD student☆17Dec 31, 2023Updated 2 years ago
- Class Prior Estimation in Active Positive and Unlabeled Learning☆16Mar 24, 2021Updated 5 years ago
- A pytorch reimplementation of KL-Loss (CVPR'2019)☆15Oct 15, 2023Updated 2 years ago