sangmichaelxie / cs324_p2
Project 2 (Building Large Language Models) for Stanford CS324: Understanding and Developing Large Language Models (Winter 2022)
☆104Updated 2 years ago
Alternatives and similar repositories for cs324_p2:
Users that are interested in cs324_p2 are comparing it to the libraries listed below
- Functional local implementations of main model parallelism approaches☆95Updated 2 years ago
- A puzzle to learn about prompting☆127Updated last year
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆255Updated last year
- Website for hosting the Open Foundation Models Cheat Sheet.☆267Updated last week
- RuLES: a benchmark for evaluating rule-following in language models☆220Updated last month
- Code repository for the c-BTM paper☆106Updated last year
- Evaluating LLMs with fewer examples☆150Updated last year
- ☆264Updated 2 months ago
- An interactive exploration of Transformer programming.☆262Updated last year
- [NeurIPS 2023] Learning Transformer Programs☆159Updated 10 months ago
- Supercharge huggingface transformers with model parallelism.☆76Updated 6 months ago
- ☆166Updated last year
- ☆128Updated 2 weeks ago
- Scaling Data-Constrained Language Models☆335Updated 6 months ago
- ☆92Updated last year
- Fast bare-bones BPE for modern tokenizer training☆152Updated 2 weeks ago
- A comprehensive deep dive into the world of tokens☆221Updated 9 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆59Updated 2 months ago
- Extract full next-token probabilities via language model APIs☆240Updated last year
- Puzzles for exploring transformers☆342Updated last year
- ☆84Updated 6 months ago
- ☆150Updated last year
- Inference code for LLaMA models in JAX☆117Updated 10 months ago
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆218Updated last year
- minimal pytorch implementation of bm25 (with sparse tensors)☆100Updated last year
- An extension of the nanoGPT repository for training small MOE models.☆123Updated last month
- ☆87Updated last year
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆81Updated last year
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆72Updated 8 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆126Updated last year