URS Benchmark: Evaluating LLMs on User Reported Scenarios
☆30May 30, 2025Updated 9 months ago
Alternatives and similar repositories for URS
Users that are interested in URS are comparing it to the libraries listed below
Sorting:
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- The backup repository for FairytaleQA dataset and paper "Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset f…☆10May 30, 2023Updated 2 years ago
- An extended project of the LLM Compiler paper, focusing on developing LLM-based Autonomous Agents.☆26Oct 22, 2024Updated last year
- ☆11Sep 19, 2025Updated 5 months ago
- ☆16Mar 3, 2024Updated 2 years ago
- ☆15Dec 3, 2024Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Jun 3, 2024Updated last year
- This repo is to demo the concept of lossless compression with Transformers as encoder and decoder.☆14May 2, 2024Updated last year
- Automated testing and benchmarking for code generation agents.☆18Jun 27, 2023Updated 2 years ago
- Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning [ICLR 2025]☆50Jan 24, 2025Updated last year
- Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables☆21May 18, 2025Updated 9 months ago
- Official code and dataset repository of KoBBQ (TACL 2024)☆19May 13, 2024Updated last year
- Official Repository for "BlendX: Complex Multi-intent Detection with Blended Patterns"☆27Jan 16, 2026Updated last month
- StrategyQA 데이터 세트 번역☆23Apr 12, 2024Updated last year
- Neural Unification for Logic Reasoning over Language☆22Nov 15, 2021Updated 4 years ago
- ☆22Jan 3, 2025Updated last year
- 언어모델을 학습하기 위한 공개 한국어 instruction dataset들을 모아두었습니다.☆19Jul 16, 2023Updated 2 years ago
- The DPAB-α Benchmark☆32Jan 15, 2025Updated last year
- [ICLR 2025] 🚀 CodeMMLU Evaluator: A framework for evaluating LM models on CodeMMLU MCQs benchmark.☆29Apr 21, 2025Updated 10 months ago
- Code for paper 'Data-Efficient FineTuning'☆28May 24, 2023Updated 2 years ago
- Reward Model을 이용하여 언어모델의 답변을 평가하기☆29Feb 23, 2024Updated 2 years ago
- Study and research with your docs, media, and AI in one place☆33Updated this week
- KLUE Benchmark 1st place (2021.12) solutions. (RE, MRC, NLI, STS, TC)☆25Apr 11, 2022Updated 3 years ago
- Adding new tasks to T0 without catastrophic forgetting☆33Oct 20, 2022Updated 3 years ago
- OpenAI Function Call Schema Composer and Executor from OpenAPI (Swagger) Document.☆32Jan 6, 2025Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆92Oct 30, 2024Updated last year
- ☆13Nov 5, 2024Updated last year
- ☆36Oct 4, 2023Updated 2 years ago
- [ACL 2025 Findings] Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts (As Huggingface Daily Papers: …☆90Nov 23, 2025Updated 3 months ago
- ☆41Oct 3, 2023Updated 2 years ago
- [EACL 2023] CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification☆42Apr 29, 2023Updated 2 years ago
- CVE-Factory☆53Feb 13, 2026Updated 2 weeks ago
- [EMNLP 2023] Question Answering as Programming for Solving Time-Sensitive Questions☆12Dec 18, 2023Updated 2 years ago
- ☆12Nov 22, 2024Updated last year
- A simple API that can generate various types of hexagon grids - returns GeoJSON data or load into PostGIS with performant JDBC.☆10Aug 2, 2025Updated 7 months ago
- A Library for Scaling Mixed-Integer Optimization-Based Machine Learning.☆12Jun 24, 2024Updated last year
- An simplest PE parser, which list all import and export entries☆12Oct 11, 2018Updated 7 years ago
- ☆16May 13, 2021Updated 4 years ago
- ☆19Jan 15, 2026Updated last month