BlinkDL / LM-Trick-Questions
Here we collect trick questions and failed tasks for open source LLMs to improve them.
☆32Updated last year
Related projects ⓘ
Alternatives and complementary repositories for LM-Trick-Questions
- Let us make Psychohistory (as in Asimov) a reality, and accessible to everyone. Useful for LLM grounding and games / fiction / business /…☆40Updated last year
- Here we will test various linear attention designs.☆56Updated 7 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆43Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- ☆28Updated 5 months ago
- A repository for research on medium sized language models.☆74Updated 6 months ago
- ☆33Updated 4 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆19Updated 2 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆29Updated 5 months ago
- RWKV model implementation☆38Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆30Updated 3 months ago
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆53Updated 6 months ago
- GoldFinch and other hybrid transformer components☆40Updated 4 months ago
- ☆18Updated 5 months ago
- Griffin MQA + Hawk Linear RNN Hybrid☆85Updated 6 months ago
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning☆33Updated last year
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆109Updated last month
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆41Updated last year
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆33Updated 3 weeks ago
- RWKV-7: Surpassing GPT☆47Updated last week
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆29Updated 3 weeks ago
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆14Updated last year
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆35Updated 11 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆93Updated last month
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year
- Explorations into the recently proposed Taylor Series Linear Attention☆90Updated 3 months ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆15Updated 2 weeks ago