ericyuegu / halLinks
Training AI for Super Smash Bros. Melee
☆32Updated 10 months ago
Alternatives and similar repositories for hal
Users that are interested in hal are comparing it to the libraries listed below
Sorting:
- ☆27Updated last year
- ☆40Updated last year
- σ-GPT: A New Approach to Autoregressive Models☆70Updated last year
- Shaping capabilities with token-level pretraining data filtering☆75Updated 2 weeks ago
- ☆24Updated 8 months ago
- LLMs represent numbers on a helix and manipulate that helix to do addition.☆28Updated last year
- OMNI: Open-endedness via Models of human Notions of Interestingness☆58Updated last year
- ☆34Updated last year
- look how they massacred my boy☆63Updated last year
- Generative cellular automaton-like learning environments for RL.☆20Updated last year
- Video Diffusion Model. Autoregressive, long context, efficient training and inference. WIP☆34Updated 5 months ago
- H-Net Dynamic Hierarchical Architecture☆81Updated 4 months ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆107Updated 2 months ago
- ☆62Updated 7 months ago
- Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notation☆14Updated last month
- realtime latent world model inference demo☆49Updated last year
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Updated last year
- ☆55Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Updated last year
- Losslessly encode text natively with arithmetic coding and HuggingFace Transformers☆77Updated 3 months ago
- ☆22Updated last year
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆32Updated 8 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆110Updated 11 months ago
- ☆53Updated 2 years ago
- Latent Large Language Models☆19Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆86Updated 4 months ago
- Learning Universal Predictors☆81Updated last year
- Approximating the joint distribution of language models via MCTS☆22Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Updated 9 months ago
- Jax like function transformation engine but micro, microjax☆34Updated last year