Aleph-Alpha-Research / trigrams
☆56Updated this week
Alternatives and similar repositories for trigrams
Users that are interested in trigrams are comparing it to the libraries listed below
Sorting:
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆72Updated 8 months ago
- ☆47Updated 8 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 8 months ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆47Updated this week
- EvaByte: Efficient Byte-level Language Models at Scale☆92Updated 3 weeks ago
- ☆43Updated last year
- This repo is based on https://github.com/jiaweizzhao/GaLore☆27Updated 7 months ago
- ☆48Updated 6 months ago
- ☆114Updated 2 months ago
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆60Updated 6 months ago
- A repository for research on medium sized language models.☆76Updated 11 months ago
- GoldFinch and other hybrid transformer components☆45Updated 9 months ago
- ☆78Updated 8 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆30Updated 2 months ago
- ☆56Updated this week
- ☆43Updated 3 months ago
- ☆28Updated 5 months ago
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆14Updated last year
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆23Updated last month
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆41Updated 3 weeks ago
- ☆81Updated last year
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆55Updated 3 weeks ago
- ☆38Updated last year
- ☆33Updated 10 months ago
- Experiments for efforts to train a new and improved t5☆77Updated last year
- NanoGPT (124M) quality in 2.67B tokens☆28Updated last week
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆98Updated 2 months ago
- prime-rl is a codebase for decentralized RL training at scale☆89Updated this week
- ☆53Updated last year
- Train your own SOTA deductive reasoning model☆92Updated 2 months ago