uq-project/UQ

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/uq-project/UQ)

uq-project / UQ

UQ: Assessing Language Models on Unsolved Questions

☆30

Alternatives and similar repositories for UQ

Users that are interested in UQ are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

reka-ai / research-eval
View on GitHub
A benchmark to evaluate search-augmented LLMs
☆17Aug 28, 2025Updated 10 months ago
datarubrics / datarubrics
View on GitHub
DataRubrics, a structured framework for assessing the quality of both human- and model-generated datasets. Leveraging recent advances in …
☆17Jun 6, 2025Updated last year
facebookresearch / llm-speedrunner
View on GitHub
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆145May 6, 2026Updated 2 months ago
BatsResearch / crosslingual-test-time-scaling
View on GitHub
Crosslingual Reasoning through Test-Time Scaling
☆21May 13, 2025Updated last year
BryceZhuo / HybridNorm
View on GitHub
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆19Mar 7, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
tongxuluo / LeaP
View on GitHub
Code, Data and Model for Paper "Learning from Peers in Reasoning Models"
☆26May 13, 2025Updated last year
TuringEyeTest / TuringEyeTest
View on GitHub
Pixels, Patterns, but no Poetry: To See the World like Humans
☆18Aug 11, 2025Updated 11 months ago
morse-benchmark / morse-500
View on GitHub
☆31May 21, 2026Updated 2 months ago
assafbk / OPRM
View on GitHub
Overflow Prevention Enhances Long-Context Recurrent LLMs (COLM 2025)
☆18Jul 8, 2025Updated last year
IBM / analog-foundation-models
View on GitHub
Code for paper "Analog Foundation Models"
☆36Mar 25, 2026Updated 4 months ago
drbh / yamoe
View on GitHub
🔀 yet another mixture of experts
☆23Jun 5, 2026Updated last month
kuzudb / dspy-kuzu-demo
View on GitHub
Intro to using DSPy with Kuzu to enrich the data within the Nobel Laureate mentorship network
☆16Sep 16, 2025Updated 10 months ago
waterhorse1 / Natural-language-RL
View on GitHub
Natural Language Reinforcement Learning
☆101Jul 30, 2025Updated 11 months ago
chenghands-on / Dreamer_assemble
View on GitHub
An assemble of various world model including dreamer v2 and v3
☆10Sep 9, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
lliu606 / COSMOS
View on GitHub
☆20Feb 2, 2026Updated 5 months ago
JHU-CLSP / RATIONALYST
View on GitHub
Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044
☆36Oct 3, 2024Updated last year
multimodal-art-projection / CriticLean
View on GitHub
☆50Aug 5, 2025Updated 11 months ago
nikhilvyas / SOAP_MUON
View on GitHub
Combining SOAP and MUON
☆23Feb 11, 2025Updated last year
RWKV / RWKV-LM
View on GitHub
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…
☆62Mar 17, 2025Updated last year
inclusionAI / GroveMoE
View on GitHub
☆24Aug 20, 2025Updated 11 months ago
firstbatchxyz / mem-agent
View on GitHub
Memory Agent monorepo
☆89Oct 9, 2025Updated 9 months ago
YuejiangLIU / csl
View on GitHub
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts
☆15Feb 26, 2024Updated 2 years ago
Lossfunk / KernelBench-v2
View on GitHub
KernelBench v2: Can LLMs Write GPU Kernels? - Benchmark with Torch -> Triton (and more!) problems
☆24Jul 4, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
facebookresearch / zero
View on GitHub
PyTorch Implementation of Zero-Shot Vision Encoder Grafting via LLM Surrogates [ICCV'25]
☆54Jul 10, 2025Updated last year
rll-research / finetune-vs-metarl
View on GitHub
☆14May 31, 2022Updated 4 years ago
allenai / autodiscovery-neurips
View on GitHub
Official code for NeurIPS 2025 paper "AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise"
☆195Jul 2, 2026Updated 3 weeks ago
metal-chart-generation / metal
View on GitHub
☆44May 29, 2025Updated last year
haon-chen / mmE5
View on GitHub
☆59Feb 27, 2025Updated last year
fannie1208 / GLIND
View on GitHub
[ICML2024] Learning Divergence Fields for Shift-Robust Graph Representations
☆11Aug 15, 2024Updated last year
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
kmswin1 / Syntriever
View on GitHub
"Syntriever: How to Train Your Retriever with Synthetic Data from LLMs" the Nations of the Americas Chapter of the Association for Comput…
☆29Mar 5, 2025Updated last year
VITA-Group / Junk_DNA_Hypothesis
View on GitHub
[ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…
☆16Apr 21, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Mercor-Intelligence / apex-evals
View on GitHub
☆15Jun 19, 2026Updated last month
NJU-LINK / DR3-Eval
View on GitHub
☆39May 7, 2026Updated 2 months ago
swiss-ai / parity-aware-bpe
View on GitHub
Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization [ACL 2026]
☆20Apr 18, 2026Updated 3 months ago
METR / Measuring-Early-2025-AI-on-Exp-OSS-Devs
View on GitHub
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity: https://metr.org/blog/2025-07-10-early-2025-ai-e…
☆16Feb 23, 2026Updated 5 months ago
thunlp / SparsingLaw
View on GitHub
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆32Nov 12, 2024Updated last year
chentong0 / rl-binary-rar
View on GitHub
Official repo for "Binary Retrieval-augmented Reward Mitigates Hallucinations"
☆15Nov 13, 2025Updated 8 months ago
complex-reasoning / RPG
View on GitHub
[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)
☆76Jun 29, 2026Updated 3 weeks ago