subhashk01 / LLM-addition
LLMs represent numbers on a helix and manipulate that helix to do addition.
☆21Updated last month
Alternatives and similar repositories for LLM-addition:
Users that are interested in LLM-addition are comparing it to the libraries listed below
- Repository to create traveling waves integrate special information through time☆49Updated 2 weeks ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆91Updated 3 weeks ago
- ☆74Updated 7 months ago
- σ-GPT: A New Approach to Autoregressive Models☆62Updated 7 months ago
- look how they massacred my boy☆63Updated 5 months ago
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆37Updated last month
- ☆48Updated 4 months ago
- ☆124Updated this week
- Simple GRPO scripts and configurations.☆58Updated last month
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated 11 months ago
- ☆27Updated 8 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆85Updated last week
- ☆38Updated 8 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆39Updated last month
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated last week
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆72Updated 4 months ago
- Official Code Release for "Training a Generally Curious Agent"☆19Updated 3 weeks ago
- ☆21Updated 4 months ago
- Lego for GRPO☆25Updated last week
- ☆20Updated 3 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆52Updated last week
- Latent Large Language Models☆17Updated 7 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆135Updated 2 weeks ago
- ☆49Updated last year
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆12Updated last month
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆28Updated last month
- BH hackathon☆14Updated 11 months ago
- ☆16Updated 3 weeks ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆125Updated 3 months ago
- A repository for research on medium sized language models.☆76Updated 10 months ago