THUDM / SwissArmyTransformer
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
☆1,068Updated 2 months ago
Alternatives and similar repositories for SwissArmyTransformer:
Users that are interested in SwissArmyTransformer are comparing it to the libraries listed below
- Open Academic Research on Improving LLaMA to SOTA LLM☆1,617Updated last year
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"☆864Updated 3 months ago
- LOMO: LOw-Memory Optimization☆981Updated 8 months ago
- Emu Series: Generative Multimodal Models from BAAI☆1,695Updated 5 months ago
- Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"☆1,405Updated last year
- real Transformer TeraFLOPS on various GPUs☆898Updated last year
- A plug-and-play library for parameter-efficient-tuning (Delta Tuning)☆1,019Updated 6 months ago
- [NIPS2023] RRHF & Wombat☆804Updated last year
- Collaborative Training of Large Language Models in an Efficient Way☆413Updated 6 months ago
- Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo☆1,068Updated 7 months ago
- We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tunin…☆2,713Updated last year
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆936Updated 3 months ago
- An optimized deep prompt tuning strategy comparable to fine-tuning across scales and tasks☆2,020Updated last year
- Next-Token Prediction is All You Need☆2,042Updated last week
- ☆910Updated 10 months ago
- 🩹Editing large language models within 10 seconds⚡☆1,317Updated last year
- Rotary Transformer☆922Updated 3 years ago
- ☆459Updated 9 months ago
- Efficient Training (including pre-training and fine-tuning) for Big Models☆580Updated 8 months ago
- A fast MoE impl for PyTorch☆1,682Updated last month
- Best practice for training LLaMA models in Megatron-LM☆645Updated last year
- huggingface mirror download☆567Updated last week
- ☆903Updated last year
- 更纯粹、更高压缩率的Tokenizer☆470Updated 3 months ago
- [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition☆619Updated 8 months ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,378Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆2,029Updated this week
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch☆638Updated 2 months ago
- Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch☆1,235Updated 2 years ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,218Updated 2 weeks ago