clankur / einygptLinks
a transformer implemented primarily using einops and trained on the tinystories dataset
☆13Updated last year
Alternatives and similar repositories for einygpt
Users that are interested in einygpt are comparing it to the libraries listed below
Sorting:
- Repository for "I am a Strange Dataset: Metalinguistic Tests for Language Models"☆45Updated 2 years ago
- Experiments with generating opensource language model assistants☆97Updated 2 years ago
- Exploring finetuning public checkpoints on filter 8K sequences on Pile☆116Updated 2 years ago
- A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.☆62Updated 7 months ago
- Official code for ACL 2023 (short, findings) paper "Recursion of Thought: A Divide and Conquer Approach to Multi-Context Reasoning with L…☆45Updated 2 years ago
- Repository for the code of the "PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided Decoding" paper, NAACL'22☆66Updated 3 years ago
- Code repository for the c-BTM paper☆108Updated 2 years ago
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆60Updated last year
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆150Updated last month
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 3 years ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆232Updated last year
- ☆105Updated last year
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆113Updated 3 months ago
- Multi-Domain Expert Learning☆67Updated 2 years ago
- ☆50Updated last year
- Code for Zero-Shot Tokenizer Transfer☆142Updated last year
- Model implementation for the contextual embeddings project☆40Updated 8 months ago
- Synthetic Data Generation for Evaluation☆13Updated 11 months ago
- ☆20Updated 2 years ago
- A repository for transformer critique learning and generation☆89Updated 2 years ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆83Updated 2 years ago
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆27Updated 2 years ago
- My explorations into editing the knowledge and memories of an attention network☆35Updated 3 years ago
- ☆74Updated 2 years ago
- Simple GRPO scripts and configurations.☆59Updated last year
- some common Huggingface transformers in maximal update parametrization (µP)☆87Updated 3 years ago
- ☆48Updated last year
- minimal pytorch implementation of bm25 (with sparse tensors)☆104Updated 3 months ago
- ☆77Updated last year
- Multipack distributed sampler for fast padding-free training of LLMs☆204Updated last year