NVlabs / Jet-NemotronLinks
☆403Updated this week
Alternatives and similar repositories for Jet-Nemotron
Users that are interested in Jet-Nemotron are comparing it to the libraries listed below
Sorting:
- GRadient-INformed MoE☆265Updated 11 months ago
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆178Updated 2 weeks ago
- Sparse Inferencing for transformer based LLMs☆197Updated 2 weeks ago
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆518Updated 3 weeks ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆120Updated 2 weeks ago
- Simple & Scalable Pretraining for Neural Architecture Research☆289Updated this week
- LLM Inference on consumer devices☆124Updated 5 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆329Updated 2 months ago
- Verification of Google DeepMind's AlphaEvolve 48-multiplication matrix algorithm, a breakthrough in matrix multiplication after 56 years.☆118Updated 2 months ago
- codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)☆501Updated 2 weeks ago
- All information and news with respect to Falcon-H1 series☆83Updated last week
- Samples of good AI generated CUDA kernels☆89Updated 2 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆83Updated 3 months ago
- ☆306Updated last week
- ☆148Updated 2 months ago
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆145Updated this week
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆319Updated 10 months ago
- ☆290Updated 3 weeks ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆97Updated 3 weeks ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆298Updated 3 months ago
- PyTorch implementation of models from the Zamba2 series.☆184Updated 7 months ago
- ☆100Updated 3 weeks ago
- ☆55Updated 3 months ago
- Clue inspired puzzles for testing LLM deduction abilities☆40Updated 5 months ago
- Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆98Updated 3 weeks ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆220Updated 2 months ago
- Code and data for the Chain-of-Draft (CoD) paper☆318Updated 5 months ago
- Work in progress.☆72Updated last month
- ☆51Updated 2 months ago
- An open source implementation of LFMs from Liquid AI: Liquid Foundation Models☆185Updated last week