MoonshotAI / K2-Vendor-VerfierLinks

Verify Precision of all Kimi K2 API Vendor

☆139

Alternatives and similar repositories for K2-Vendor-Verfier

Users that are interested in K2-Vendor-Verfier are comparing it to the libraries listed below

Sorting:

changjonathanc / llmproc
LLMProc: Unix-inspired runtime that treats LLMs as processes.
☆33Updated 2 months ago
haizelabs / j1-micro
j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.
☆98Updated 2 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆106Updated 6 months ago
lmarena / p2l
Prompt-to-Leaderboard
☆254Updated 4 months ago
codelion / pts
Pivotal Token Search
☆125Updated 2 months ago
scaleapi / SWE-bench_Pro-os
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
☆107Updated this week
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆190Updated 2 weeks ago
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆294Updated last month
ScalingIntelligence / tokasaurus
☆421Updated last month
brendanhogan / picoDeepResearch
☆68Updated 4 months ago
adobe-research / NoLiMa
Official repository for "NoLiMa: Long-Context Evaluation Beyond Literal Matching"
☆155Updated 2 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆172Updated 8 months ago
benchflow-ai / benchflow
AI benchmark runtime framework that allows you to integrate and evaluate AI tasks using Docker-based benchmarks.
☆158Updated 4 months ago
QuixiAI / OpenChatML
☆161Updated last month
kubernetes-bad / reward-composer
Lego for GRPO
☆29Updated 4 months ago
Aider-AI / polyglot-benchmark
Coding problems used in aider's polyglot benchmark
☆180Updated 9 months ago
Not-Diamond / RoRF
Routing on Random Forest (RoRF)
☆206Updated last year
jd-3d / SOLOBench
☆133Updated 4 months ago
lechmazur / confabulations
Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.
☆227Updated last month
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆55Updated 7 months ago
Mihaiii / backtrack_sampler
An easy-to-understand framework for LLM samplers that rewind and revise generated tokens
☆146Updated 7 months ago
ServiceNow / Fast-LLM
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
☆226Updated this week
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆77Updated 6 months ago
commit-0 / commit0
Commit0: Library Generation from Scratch
☆167Updated 4 months ago
sam-paech / slop-forensics
☆268Updated 3 months ago
tokenbender / avataRL
rl from zero pretrain, can it be done? yes.
☆269Updated this week
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆88Updated 4 months ago
eqimp / hogwild_llm
Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache
☆124Updated last month
haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆128Updated 4 months ago
Danau5tin / terminal-bench-rl
GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…
☆259Updated last month