vincentkoc/tiny_qa_benchmark_pp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vincentkoc/tiny_qa_benchmark_pp)

vincentkoc / tiny_qa_benchmark_pp

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

☆16

Alternatives and similar repositories for tiny_qa_benchmark_pp

Users that are interested in tiny_qa_benchmark_pp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JIA-Lab-research / Q-LLM
View on GitHub
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
☆54Jul 16, 2024Updated last year
eryk-mazus / sigh
View on GitHub
Seamless Voice Interactions with LLMs
☆12Oct 28, 2023Updated 2 years ago
fsndzomga / open_source_lrm
View on GitHub
☆10Oct 24, 2024Updated last year
graphcore-research / jax-scalify
View on GitHub
JAX Scalify: end-to-end scaled arithmetics
☆18Oct 30, 2024Updated last year
cleanlab / structured-output-benchmark
View on GitHub
A Structured Output Benchmark whose 'ground-truth' is actually right
☆19Dec 5, 2025Updated 6 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
psunlpgroup / GreaTer
View on GitHub
Source code for GreaTer ICLR 2025 - Gradient Over Reasoning makes Smaller Language Models Strong Prompt Optimizers
☆36Apr 18, 2025Updated last year
Scientific-Computing-Lab / MPI-rigen
View on GitHub
MPI Code Generation through Domain-Specific Language Models
☆16Nov 19, 2024Updated last year
dair-ai / llm-evaluator
View on GitHub
Example for Logging LLM Evaluator Prompt Responses
☆18Aug 14, 2023Updated 2 years ago
Helw150 / levanter
View on GitHub
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
☆16Jun 16, 2024Updated 2 years ago
psg-mit / nightjarpy
View on GitHub
Python library to add support for embedding natural code in Python with shared program state.
☆30Jan 20, 2026Updated 5 months ago
raybears / cot-transparency
View on GitHub
Improving transparency of large language models' reasoning
☆15Nov 25, 2025Updated 7 months ago
camenduru / bria-rmbg-jupyter
View on GitHub
☆15Mar 12, 2024Updated 2 years ago
jinzhuoran / RAG-RewardBench
View on GitHub
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆18Dec 19, 2024Updated last year
ArmelRandy / tree-of-problems
View on GitHub
[EMNLP 2024] Tree of Problems: Improving structured problem solving with compositionality
☆20Mar 4, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
facebookresearch / NeuralMemory
View on GitHub
A Data Source for Reasoning Embodied Agents
☆19Sep 18, 2023Updated 2 years ago
yale-nlp / refdpo
View on GitHub
☆16Jul 23, 2024Updated last year
uq-project / UQ
View on GitHub
UQ: Assessing Language Models on Unsolved Questions
☆30Aug 26, 2025Updated 10 months ago
JackVinati / WaveWizard
View on GitHub
A Gradio app for analyzing audio files to determine true sample rate and bit depth.
☆20Sep 17, 2024Updated last year
shangdatalab / Deep-Contam
View on GitHub
Official implementation of Data Contamination Can Cross Language Barriers
☆12Sep 11, 2024Updated last year
delyan-boychev / imaginet
View on GitHub
☆12Apr 25, 2026Updated 2 months ago
UMass-Embodied-AGI / genome
View on GitHub
☆16Apr 10, 2025Updated last year
iulia-b10 / multilingual-embedding-models
View on GitHub
☆20Jan 27, 2024Updated 2 years ago
clovaai / TVQ-VAE
View on GitHub
Official pytorch implementation for TVQ-VAE
☆12Feb 27, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Blackzxy / LoGAH
View on GitHub
☆22Sep 29, 2024Updated last year
01yzzyu / wikiautogen
View on GitHub
[ICCV2025] WikiAutoGen offical page
☆25Feb 6, 2026Updated 4 months ago
Tufalabs / TextbooksToRL
View on GitHub
☆29Aug 27, 2025Updated 10 months ago
Rose-STL-Lab / AutoSTPP
View on GitHub
Automatic Integration for Neural Spatio-Temporal Point Process models (AI-STPP) is a new paradigm for exact, efﬁcient, non-parametric inf…
☆25Oct 14, 2024Updated last year
Hao840 / ADEM-VL
View on GitHub
PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"
☆21Oct 28, 2024Updated last year
CassioML / langchain-flare-pdf-qa-demo
View on GitHub
PDF FLARE demo with Langchain and Cassandra as Vector Store
☆14Nov 15, 2023Updated 2 years ago
karminski / rpg-maker-agent
View on GitHub
给 AI 一篇文章，自动生成一款可以玩的 RPG Maker MZ 游戏
☆88Mar 5, 2026Updated 3 months ago
guyyariv / LaMI
View on GitHub
[ACL 2026 Oral] Official implementation of LaMI: Augmenting Large Language Models via Late Multi-Image Fusion
☆19May 18, 2026Updated last month
formll / resolving-scaling-law-discrepancies
View on GitHub
☆19Nov 4, 2025Updated 7 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
yyyyychen / LowMemoryBP
View on GitHub
The official implementation of the paper "Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation"
☆21Dec 10, 2024Updated last year
qcznlp / uncertainty_attack
View on GitHub
☆23Sep 2, 2025Updated 9 months ago
swarnaHub / System-1.x
View on GitHub
PyTorch code for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
☆25Jul 22, 2024Updated last year
karminski / awesome-llm-benchmark-prompts
View on GitHub
☆65May 21, 2026Updated last month
RUCAIBox / GPO
View on GitHub
About The official GitHub page for ''Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with …
☆30Dec 12, 2024Updated last year
linhaowei1 / kumo
View on GitHub
☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models
☆20Jun 4, 2025Updated last year
chenllliang / MMEvalPro
View on GitHub
[NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs
☆25Sep 26, 2024Updated last year