Cerebras / gigaGPTLinks

a small code base for training large models

☆307

Alternatives and similar repositories for gigaGPT

Users that are interested in gigaGPT are comparing it to the libraries listed below

Sorting:

google-deepmind / recurrentgemma
Open weights language model from Google DeepMind, based on Griffin.
☆645Updated 2 months ago
persimmon-ai-labs / adept-inference
Inference code for Persimmon-8B
☆415Updated last year
valine / NeuralFlow
Visualize the intermediate output of Mistral 7B
☆367Updated 6 months ago
adamkarvonen / chess_llm_interpretability
Visualizing the internal board state of a GPT trained on chess PGN strings, and performing interventions on its internal board state and …
☆208Updated 8 months ago
tysam-code / hlb-gpt
Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…
☆349Updated last year
sumo43 / loopvlm
run paligemma in real time
☆131Updated last year
pbelcak / UltraFastBERT
The repository for the code of the UltraFastBERT paper
☆516Updated last year
mistralai-sf24 / hackathon
☆447Updated last year
SkunkworksAI / hydra-moe
☆416Updated last year
Locutusque / TPU-Alignment
Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free
☆232Updated 9 months ago
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year
abacaj / train-with-fsdp
☆93Updated last year
euclaise / SlimTrainer
Full finetuning of large language models without large memory requirements
☆94Updated last year
sabetAI / BLoRA
batched loras
☆344Updated last year
gautierdag / bpeasy
Fast bare-bones BPE for modern tokenizer training
☆160Updated last month
sdan / selfextend
an implementation of Self-Extend, to expand the context window via grouped attention
☆119Updated last year
kolinko / effort
An implementation of bucketMul LLM inference
☆221Updated last year
SakanaAI / evo-memory
Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.
☆318Updated 9 months ago
idoh / mamba.np
A pure NumPy implementation of Mamba.
☆223Updated last year
jondurbin / bagel
A bagel, with everything.
☆323Updated last year
mistralai / megablocks-public
☆864Updated last year
QuixiAI / laserRMT
This is our own implementation of 'Layer Selective Rank Reduction'
☆239Updated last year
apple / ml-sigma-reparam
☆304Updated last year
magicproduct / hash-hop
Long context evaluation for large language models
☆220Updated 5 months ago
alasdairforsythe / tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
☆595Updated last year
apoorvumang / prompt-lookup-decoding
☆556Updated 11 months ago
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 9 months ago
thomasgauthier / LoRD
Low-Rank adapter extraction for fine-tuned transformers models
☆175Updated last year
SumanthRH / tokenization
A comprehensive deep dive into the world of tokens
☆225Updated last year
geov-ai / geov
The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER).…
☆121Updated 2 years ago