SamsungSAILMontreal / ByteCraftLinks
☆41Updated 10 months ago
Alternatives and similar repositories for ByteCraft
Users that are interested in ByteCraft are comparing it to the libraries listed below
Sorting:
- webgpu autograd library☆33Updated 8 months ago
- ☆40Updated last year
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedback☆114Updated 11 months ago
- A tree-based prefix cache library that allows rapid creation of looms: hierarchal branching pathways of LLM generations.☆77Updated 11 months ago
- Training hybrid models for dummies.☆29Updated 3 months ago
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆35Updated 10 months ago
- Training AI for Super Smash Bros. Melee☆32Updated 10 months ago
- ☆148Updated last year
- look how they massacred my boy☆63Updated last year
- Implementing the BitNet model in Rust☆44Updated last year
- ☆52Updated last year
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13Updated last year
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆43Updated 2 years ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆61Updated last year
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Updated last week
- SVGBench: A challenging LLM benchmark that tests knowledge, coding, physical reasoning capabilities of LLMs.☆62Updated this week
- Approximating the joint distribution of language models via MCTS☆22Updated last year
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated 9 months ago
- Using Large Language Models for Repo-wide Type Prediction☆114Updated 2 years ago
- Latent Large Language Models☆19Updated last year
- ☆100Updated last week
- Implementation of mamba with rust☆92Updated last year
- working implimention of deepseek MLA☆45Updated last year
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆32Updated 4 months ago
- Plotting (entropy, varentropy) for small LMs☆99Updated 8 months ago
- AirLLM 70B inference with single 4GB GPU☆17Updated 7 months ago
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆102Updated 6 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Updated 3 months ago
- A minimal implementation of Drifting Models for 2D toy data. Unlike diffusion/flow models that iterate at inference, drifting models evo…☆39Updated this week