A challenging aggregation benchmark for long-context models
☆41Feb 22, 2026Updated last month
Alternatives and similar repositories for oolong
Users that are interested in oolong are comparing it to the libraries listed below
Sorting:
- ☆34Nov 26, 2025Updated 3 months ago
- The code implementation for TTCS: Test-Time Curriculum Synthesis for Self-Evolving.☆39Mar 8, 2026Updated 2 weeks ago
- PAHF Personalized Agent from Human Feedback☆44Mar 6, 2026Updated 2 weeks ago
- FSPEC: The Spec-Driven, Multi-Agent Coding Factory. It is infrastructure for the "Dark Factory"—the emerging model of fully autonomous so…☆56Updated this week
- Some microbenchmarks and design docs before commencement☆11Feb 1, 2021Updated 5 years ago
- On demand communication☆32Mar 3, 2026Updated 2 weeks ago
- BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent☆208Dec 11, 2025Updated 3 months ago
- Model and datasets for schema matching☆14Jul 17, 2021Updated 4 years ago
- Transformers at any scale☆42Jan 18, 2024Updated 2 years ago
- [Preprint] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments☆191Jan 12, 2026Updated 2 months ago
- Kernel Playground - A playground to run large scale experiments on the Linux Kernel☆17Nov 8, 2025Updated 4 months ago
- 用Paddle复现论文ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information(ACL2021)☆10Nov 15, 2021Updated 4 years ago
- Efficient Transformers with Dynamic Token Pooling☆68May 20, 2023Updated 2 years ago
- ☆13Mar 14, 2026Updated last week
- The official implementation of the paper "Mem-α: Learning Memory Construction via Reinforcement Learning"☆189Dec 25, 2025Updated 2 months ago
- ☆26Mar 10, 2026Updated last week
- ☆40Jul 26, 2024Updated last year
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- MathLex JavaScript math entry system☆21Apr 29, 2025Updated 10 months ago
- ☆16Jun 25, 2025Updated 8 months ago
- Code and dataset for EMNLP 2022 Findings paper "Benchmarking Language Models for Code Syntax Understanding"☆16Oct 24, 2022Updated 3 years ago
- ☆11Oct 11, 2023Updated 2 years ago
- 📝The official repository of "Rethinking Cross-Generator Image Forgery Detection through DINOv3"☆21Dec 2, 2025Updated 3 months ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning☆56Feb 24, 2026Updated 3 weeks ago
- moodist☆25Mar 13, 2026Updated last week
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Kinetics: Rethinking Test-Time Scaling Laws☆86Jul 11, 2025Updated 8 months ago
- PROSE Public Benchmark Suite☆32Sep 15, 2025Updated 6 months ago
- Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span …☆14Aug 25, 2023Updated 2 years ago
- Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"☆11Jan 10, 2025Updated last year
- ☆26Nov 7, 2022Updated 3 years ago
- QuoteSum is a textual QA dataset containing Semi-Extractive Multi-source Question Answering (SEMQA) examples written by humans, based on …☆13Mar 25, 2024Updated last year
- Official release of the CLVR Jaco Play Dataset, Dass et al. 2023☆17Apr 24, 2023Updated 2 years ago
- Data and code for paper "ODSum: New Benchmarks for Open Domain Multi-Document Summarization"☆11Sep 20, 2024Updated last year
- PyTorch implementation for PaLM: A Hybrid Parser and Language Model.☆10Jan 7, 2020Updated 6 years ago
- [ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs☆17May 21, 2025Updated 10 months ago
- Recursive Bayesian Networks☆11May 11, 2025Updated 10 months ago
- https://arxiv.org/abs/2404.10917☆14Mar 18, 2025Updated last year