jiasenlu / LL3M
LL3M: Large Language and Multi-Modal Model in Jax
β72Updated last year
Alternatives and similar repositories for LL3M:
Users that are interested in LL3M are comparing it to the libraries listed below
- M4 experiment logbookβ57Updated last year
- [NeurIPS-2024] π Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623β84Updated 7 months ago
- Language models scale reliably with over-training and on downstream tasksβ96Updated last year
- [ICLR 2025] Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"β52Updated 2 months ago
- β91Updated 7 months ago
- β78Updated 8 months ago
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.β81Updated 2 months ago
- Multimodal language model benchmark, featuring challenging examplesβ167Updated 4 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)β153Updated 3 weeks ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrunβ49Updated last month
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.β94Updated this week
- Implementation of Infini-Transformer in Pytorchβ110Updated 4 months ago
- Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorchβ165Updated 4 months ago
- EvaByte: Efficient Byte-level Language Models at Scaleβ91Updated 2 weeks ago
- Easily run PyTorch on multiple GPUs & machinesβ45Updated last month
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"β100Updated 2 weeks ago
- β78Updated 10 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"β47Updated last year
- A repository for research on medium sized language models.β76Updated 11 months ago
- Matryoshka Multimodal Modelsβ101Updated 3 months ago
- A basic pure pytorch implementation of flash attentionβ16Updated 6 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"β97Updated 7 months ago
- π¦Ύ EvalGIM (pronounced as "EvalGym") is an evaluation library for generative image models. It enables easy-to-use, reproducible automaticβ¦β73Updated 4 months ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"β97Updated 3 weeks ago
- β95Updated last year
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Modelsβ51Updated 2 months ago
- Self-Alignment with Principle-Following Reward Modelsβ160Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"β37Updated last year
- General Reasoner: Advancing LLM Reasoning Across All Domainsβ76Updated last week
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonettoβ55Updated 11 months ago