tiiuae / Falcon-H1Links
All information and news with respect to Falcon-H1 series
☆38Updated this week
Alternatives and similar repositories for Falcon-H1
Users that are interested in Falcon-H1 are comparing it to the libraries listed below
Sorting:
- EvaByte: Efficient Byte-level Language Models at Scale☆102Updated 2 months ago
- ☆51Updated 7 months ago
- ☆79Updated 10 months ago
- ☆51Updated 7 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆33Updated 3 months ago
- A repository for research on medium sized language models.☆76Updated last year
- This repo is based on https://github.com/jiaweizzhao/GaLore☆28Updated 9 months ago
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆23Updated 2 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆32Updated 3 months ago
- Simple GRPO scripts and configurations.☆58Updated 4 months ago
- ☆98Updated 5 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆109Updated 2 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- GoldFinch and other hybrid transformer components☆45Updated 11 months ago
- ☆81Updated last year
- Lego for GRPO☆28Updated last month
- Simple repository for training small reasoning models☆33Updated 4 months ago
- ☆115Updated 4 months ago
- Verifiers for LLM Reinforcement Learning☆60Updated 2 months ago
- Train your own SOTA deductive reasoning model☆94Updated 3 months ago
- PyTorch implementation of models from the Zamba2 series.☆182Updated 5 months ago
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyers☆66Updated 2 months ago
- Using FlexAttention to compute attention with different masking patterns☆44Updated 9 months ago
- ☆44Updated last year
- ☆47Updated 2 weeks ago
- Collection of autoregressive model implementation☆85Updated 2 months ago
- ☆61Updated 3 weeks ago
- Prune transformer layers☆69Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆69Updated last week