NVlabs / Jet-NemotronLinks
☆709Updated last week
Alternatives and similar repositories for Jet-Nemotron
Users that are interested in Jet-Nemotron are comparing it to the libraries listed below
Sorting:
- ☆1,226Updated 3 weeks ago
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆579Updated 2 weeks ago
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆569Updated 2 weeks ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆851Updated 2 weeks ago
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆247Updated last month
- QeRL enables RL for 32B LLMs on a single H100 GPU.☆459Updated last week
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B☆527Updated 2 weeks ago
- Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.☆751Updated 2 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆304Updated last month
- codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)☆687Updated last month
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆668Updated 7 months ago
- Official implementation of "Continuous Autoregressive Language Models"☆646Updated last week
- dLLM: Simple Diffusion Language Modeling☆1,069Updated this week
- ☆1,242Updated 2 weeks ago
- ☆917Updated last month
- Sparse Inferencing for transformer based LLMs☆215Updated 3 months ago
- Efficient LLM Inference over Long Sequences☆392Updated 5 months ago
- ToolOrchestra is an end-to-end RL training framework for orchestrating tools and agentic workflows.☆289Updated this week
- Advanced quantization toolkit for LLMs and VLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Bits and seamless integration with …☆735Updated this week
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆1,172Updated 10 months ago
- ☆843Updated 2 months ago
- ☆202Updated 11 months ago
- Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)☆522Updated 2 months ago
- An early research stage expert-parallel load balancer for MoE models based on linear programming.☆433Updated 2 weeks ago
- Dream 7B, a large diffusion language model☆1,094Updated 2 weeks ago
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆479Updated 3 months ago
- Self-Adapting Language Models☆1,575Updated 4 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆135Updated 3 months ago
- dInfer: An Efficient Inference Framework for Diffusion Language Models☆331Updated last week
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆352Updated 5 months ago