Hannibal046/GPT-OSS-BrowseCompPlus-Eval

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Hannibal046/GPT-OSS-BrowseCompPlus-Eval)

Hannibal046 / GPT-OSS-BrowseCompPlus-Eval

Evaluating GPT-OSS on BrowseComp-Plus with Native Browsering Tools

☆20

Alternatives and similar repositories for GPT-OSS-BrowseCompPlus-Eval

Users that are interested in GPT-OSS-BrowseCompPlus-Eval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

texttron / BrowseComp-Plus
View on GitHub
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent (ACL 2026 Main)
☆319May 28, 2026Updated 2 months ago
texttron / RISE
View on GitHub
Retrieving Interaction SpacE for Agentic Search
☆27Jun 8, 2026Updated last month
IBM / ensemble-instruct
View on GitHub
codebase release for EMNLP2023 paper publication
☆19Sep 18, 2025Updated 10 months ago
fresh-stack / freshstack
View on GitHub
This repository helps you evaluate your models on the FreshStack benchmark!
☆34Dec 9, 2025Updated 7 months ago
RedSearchAgent / DeepTraceHub
View on GitHub
RedSearcher's framework for deep search agent trajectory synthesis, QA filtering, and model evaluation, supporting ReACT and DeepSeek-sty…
☆23Feb 26, 2026Updated 5 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
hkust-nlp / deepsearch-tts
View on GitHub
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
☆21Oct 8, 2025Updated 9 months ago
xlang-ai / BRIGHT
View on GitHub
[ICLR 2025] BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
☆210Sep 13, 2025Updated 10 months ago
hpclab / LtR-Tutorial
View on GitHub
Efficiency/Effectiveness Trade-offs in Learning to Rank
☆12Sep 11, 2018Updated 7 years ago
gvalvano / adversarial-test-time-training
View on GitHub
Code for the papers: "Stop Throwing Away Discriminators! Re-using Adversaries for Test-Time Training", Valvano et al., DART 2021; and "Re…
☆10Jan 20, 2022Updated 4 years ago
StigLidu / TURN
View on GitHub
[ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"
☆23Feb 16, 2025Updated last year
zjunlp / SemEval2021Task4
View on GitHub
The 4th rank system of the SemEval 2021 Task4.
☆10May 7, 2022Updated 4 years ago
oceanumeric / EnteRAG
View on GitHub
A RAG that can scale 🧑🏻‍💻
☆11May 28, 2024Updated 2 years ago
ielab / fpdgd-ictir2021
View on GitHub
Implementation and results for ICTIR2021 paper: Effective and Privacy-preserving Federated Online Learning to Rank
☆11Jul 24, 2021Updated 5 years ago
jkkummerfeld / berkeley-coreference-analyser
View on GitHub
A tool for classifying errors in coreference resolution
☆29Jun 27, 2023Updated 3 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Saibo-creator / transformers-CFG
View on GitHub
☆10Mar 1, 2025Updated last year
Furyton / GR-as-MVDR
View on GitHub
[SIGIR'24] Generative Retrieval as Multi-Vector Dense Retrieval
☆36Oct 18, 2024Updated last year
McGill-NLP / instruct-qa
View on GitHub
Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"
☆87Aug 12, 2024Updated last year
JiwooKimAR / dmath
View on GitHub
☆12Feb 16, 2024Updated 2 years ago
electron-shaders / MineDraft
View on GitHub
☆38Jun 23, 2026Updated last month
ielab / llm-qlm
View on GitHub
Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking
☆17Oct 26, 2023Updated 2 years ago
aliborji / ObjectNetReanalysis
View on GitHub
reanalysis of the ObjectNet paper and our annotations and code
☆16Mar 4, 2021Updated 5 years ago
hrwise-nlp / AppBench
View on GitHub
This is for EMNLP 2024 Paper: AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction
☆16Nov 4, 2024Updated last year
csiro-mlai / dl_hpc_starter_pack
View on GitHub
pip install the deep learning & HPC starter pack to begin your project.
☆12Nov 6, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
PrimeIntellect-ai / INTELLECT-MATH
View on GitHub
A 7B parameter model for mathematical reasoning
☆42Updated this week
TIGER-AI-Lab / FIM-Midtraining
View on GitHub
☆18Jul 15, 2026Updated 2 weeks ago
xiye17 / TextualExplInContext
View on GitHub
The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)
☆16Feb 11, 2023Updated 3 years ago
PKU-ML / Message-Passing-Contrastive-Learning
View on GitHub
Official Code for ICLR 2023 Paper: A Message Passing Perspective on Learning Dynamics of Contrastive Learning
☆11Mar 9, 2023Updated 3 years ago
NodeBB / nodebb-plugin-emoji
View on GitHub
NodeBB Plugin enabling emoji as seen on http://www.emoji-cheat-sheet.com
☆14Jul 17, 2026Updated last week
mdtux89 / amr-evaluation
View on GitHub
Evaluation metrics to compare AMR graphs based on Smatch
☆29Feb 10, 2020Updated 6 years ago
hkust-nlp / WebExplorer
View on GitHub
The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"
☆120Sep 29, 2025Updated 10 months ago
Arenaa / Accelerated-Generation-Techniques
View on GitHub
This repository contains papers for a comprehensive survey on accelerated generation techniques in Large Language Models (LLMs).
☆11May 24, 2024Updated 2 years ago
boberle / corefconversion
View on GitHub
Conversion scripts for coreference
☆29Sep 30, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
open-compass / ProSA
View on GitHub
[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
☆29May 22, 2025Updated last year
xinyan-cxy / EmpathyAgent
View on GitHub
☆15Mar 18, 2025Updated last year
facebookresearch / ReasonIR
View on GitHub
Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".
☆230Jul 2, 2026Updated 3 weeks ago
angie-chen55 / pref-learning-ranking-acc
View on GitHub
☆13Jun 4, 2024Updated 2 years ago
EmbolismSoil / KNLP
View on GitHub
C++自然语言处理库
☆14Jan 22, 2020Updated 6 years ago
RonDen / PoemKGSpider
View on GitHub
古诗词爬取，并基于此构建知识图谱和分析应用
☆10Apr 27, 2021Updated 5 years ago
SteinOveHelset / codingnews
View on GitHub
☆10Jan 31, 2021Updated 5 years ago