A general framework used on evaluating the performance of large language models (LLMs) based on the peer review mechanism among LLMs
☆19Aug 3, 2024Updated last year
Alternatives and similar repositories for PRE
Users that are interested in PRE are comparing it to the libraries listed below
Sorting:
- An evaluation framework to test AI in a trial-and-error process. It is a simplified Natural Selection test.☆21Mar 11, 2025Updated 11 months ago
- SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search☆23May 24, 2023Updated 2 years ago
- The homepage for ConvSearch Dataset.☆14May 31, 2022Updated 3 years ago
- Repo. for RLCF.☆15Apr 1, 2024Updated last year
- collecting publicly available distillation datasets based on DepSeek-R1☆26Mar 12, 2025Updated 11 months ago
- Large Language Models as Evaluators for Recommendation Explanations (RecSys 2024 Reproducibility)☆20Aug 13, 2025Updated 6 months ago
- Code for AAAI 2024 paper Wikiformer☆19Dec 21, 2023Updated 2 years ago
- Code for I3 Retriever, accepted by CIKM'23.☆53Oct 22, 2023Updated 2 years ago
- Code for KERM: Incorporating Explicit Knowledge in Pre-trained Language Models for Passage Re-ranking, accepted at SIGIR 2022.☆19Oct 31, 2022Updated 3 years ago
- ☆26Jul 25, 2025Updated 7 months ago
- The official repo for our paper: LegalAgentBench: Evaluating LLM Agents in Legal Domainl☆43Dec 30, 2024Updated last year
- Official code space for "SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development"☆61Oct 24, 2025Updated 4 months ago
- Code to reproduce THUIR‘s submissions for COLIEE 2023 Task1 and Task2☆28May 12, 2023Updated 2 years ago
- The official repo for our SIGIR'23 Full paper: Constructing Tree-based Index for Efficient and Effective Dense Retrieval☆28Jun 7, 2023Updated 2 years ago
- Test-time compute in information retrieval☆54Jul 8, 2025Updated 7 months ago
- The official repo for our SIGIR'23 Full paper: Structure-aware Pre-trained Language Model for Legal Case Retrieval☆98May 9, 2023Updated 2 years ago
- 基于区块链的商品溯源系统☆10Mar 11, 2021Updated 4 years ago
- ☆16Jun 12, 2025Updated 8 months ago
- Airline AI Agent with Langflow and DataStax Astra☆12Jan 14, 2025Updated last year
- Repository for tw.org site☆14Feb 11, 2026Updated 3 weeks ago
- Jason Meridth's blog☆13Updated this week
- Official codebase for NeurIPS 2022 paper End-to-end Learning to Index and Search in Large Output Spaces☆12Apr 19, 2023Updated 2 years ago
- In ancient Egypt the pelican was believed to possess the ability to prophesy safe passage in the underworld. Pelicans are ferocious eater…☆11Apr 7, 2023Updated 2 years ago
- Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)☆19Jul 19, 2025Updated 7 months ago
- Task Wiki Opener☆10May 4, 2021Updated 4 years ago
- ☆15Sep 18, 2025Updated 5 months ago
- word2vec java版本的一个实现☆10Apr 24, 2016Updated 9 years ago
- ☆10Sep 25, 2019Updated 6 years ago
- MusicYOLO framework uses the object detection model, YOLOx, to locate notes in the spectrogram.☆11Jan 29, 2022Updated 4 years ago
- ☆13May 22, 2024Updated last year
- Evals meant to evaluate language models' ability to reason over long contexts.☆10Sep 12, 2024Updated last year
- Confidence Regulation Neurons in Language Models (NeurIPS 2024)☆15Feb 1, 2025Updated last year
- A full stack typescript SAAS boilerplate with Chat, Auth (Langgraph, supabase), Payments (stripe), and AI Credits☆17May 23, 2025Updated 9 months ago
- CIKM 2022: Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models☆11Aug 4, 2022Updated 3 years ago
- Datasets of all trades on Polymarket's online prediction markets for the 2022 US midterm elections.☆13Nov 20, 2022Updated 3 years ago
- Scala+lift frontend for automatatutor.com☆10May 13, 2019Updated 6 years ago
- ☆16Jan 15, 2025Updated last year
- Experiments on Data Poisoning Regression Learning☆12Oct 5, 2020Updated 5 years ago
- ☆11Sep 30, 2023Updated 2 years ago