jiasenlu / LL3M
LL3M: Large Language and Multi-Modal Model in Jax
☆62Updated 4 months ago
Related projects: ⓘ
- M4 experiment logbook☆56Updated last year
- Language models scale reliably with over-training and on downstream tasks☆91Updated 5 months ago
- Multimodal language model benchmark, featuring challenging examples☆144Updated last month
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆87Updated 8 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 8 months ago
- ☆68Updated 2 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly"☆121Updated 3 months ago
- A repository for research on medium sized language models.☆71Updated 3 months ago
- ☆60Updated 5 months ago
- ☆66Updated 3 months ago
- Triton Implementation of HyperAttention Algorithm☆46Updated 9 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆52Updated last month
- ☆61Updated 6 months ago
- ☆87Updated 2 months ago
- Self-Alignment with Principle-Following Reward Models☆144Updated 6 months ago
- ☆50Updated last month
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆101Updated last year
- ☆42Updated this week
- See details in https://github.com/pytorch/xla/blob/r1.12/torch_xla/distributed/fsdp/README.md☆21Updated last year
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆15Updated last week
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…☆183Updated 3 weeks ago
- This repo is based on https://github.com/jiaweizzhao/GaLore, paper coming soon☆18Updated this week
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆24Updated 5 months ago
- Big-Interleaved-Dataset☆57Updated last year
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"☆59Updated 7 months ago
- Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment☆63Updated last year
- ☆69Updated 4 months ago
- Randomized Positional Encodings Boost Length Generalization of Transformers☆78Updated 6 months ago
- Implementation of Infini-Transformer in Pytorch☆100Updated last month
- ☆98Updated 2 months ago