Modeling code for a BitNet b1.58 Llama-style model.
☆25Apr 30, 2024Updated 2 years ago
Alternatives and similar repositories for bitnet
Users that are interested in bitnet are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- QLoRA: Efficient Finetuning of Quantized LLMs☆11Jul 22, 2023Updated 2 years ago
- Implementation of BitNet-1.58 instruct tuning☆29Apr 14, 2024Updated 2 years ago
- Set of scripts to finetune LLMs☆38Mar 30, 2024Updated 2 years ago
- [COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆20Apr 9, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆20Apr 29, 2024Updated 2 years ago
- 4-bit Shampoo for Memory-Efficient Network Training (NeurIPS 2024)☆13Feb 13, 2025Updated last year
- alternative way to calculating self attention☆18May 25, 2024Updated last year
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆78Apr 29, 2024Updated 2 years ago
- ☆43Aug 5, 2025Updated 9 months ago
- 🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test a…☆36Jan 18, 2026Updated 4 months ago
- ☆13Apr 25, 2024Updated 2 years ago
- KANs and MLPs☆12Jun 7, 2024Updated last year
- Train to 94% on CIFAR-10 in 4.4 seconds on a single A100☆12Dec 30, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆19Jul 24, 2025Updated 9 months ago
- Maximal Update Parametrization (μP) with Flax & Optax.☆16Dec 27, 2023Updated 2 years ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31May 22, 2024Updated 2 years ago
- Matrix Product State algorithm for computing characters of the symmetric group S_n☆11Sep 26, 2025Updated 7 months ago
- Simple GRPO scripts and configurations.☆59Feb 6, 2025Updated last year
- ☆18May 7, 2026Updated 2 weeks ago
- Bullseye Polytope Clean-Label Poisoning Attack☆18Nov 5, 2020Updated 5 years ago
- https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation☆11Sep 11, 2019Updated 6 years ago
- ☆21Sep 6, 2021Updated 4 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆11Jun 14, 2019Updated 6 years ago
- ☆22Jan 23, 2024Updated 2 years ago
- Using Demucs in comfyUI, make Music Source Separation☆12Dec 12, 2025Updated 5 months ago
- Neural network density models for speech separation.☆20Nov 26, 2020Updated 5 years ago
- a simple variational auto encoder with some exploration☆12Nov 22, 2024Updated last year
- manage histories of LLM applied applications☆91Nov 17, 2023Updated 2 years ago
- ☆15Oct 31, 2023Updated 2 years ago
- Multipack distributed sampler for fast padding-free training of LLMs☆207Aug 10, 2024Updated last year
- Here is my implementation of Center Loss with Keras☆11May 2, 2018Updated 8 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆13Sep 8, 2020Updated 5 years ago
- ☆11Jul 6, 2023Updated 2 years ago
- Scale-able Full Stack Education Platform utilising Machine Learning, Data Science & GenAI☆11May 11, 2026Updated last week
- A toy text-to-image model trained from scratch.☆19Jun 9, 2025Updated 11 months ago
- ☆167Aug 8, 2025Updated 9 months ago
- Language modeling with linear-cost context☆118Sep 25, 2025Updated 7 months ago
- Train vision models using JAX and 🤗 transformers☆102Dec 14, 2025Updated 5 months ago