GeeeekExplorer / cupytorchLinks

A small framework mimics PyTorch using CuPy or NumPy

☆41

Alternatives and similar repositories for cupytorch

Users that are interested in cupytorch are comparing it to the libraries listed below

Sorting:

FreedomIntelligence / FastLLM
Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
☆40Updated last year
eedalong / Dpex
Distributed DataLoader For Pytorch Based On Ray
☆24Updated 3 years ago
cheneydon / efficient-bert
This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron …
☆33Updated 2 years ago
Lightning-Universe / lightning-ColossalAI
Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI
☆57Updated last year
juvi21 / CoPE-cuda
Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719
☆22Updated last year
BlinkDL / minGPT-tuned
A *tuned* minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
☆115Updated 3 years ago
bojone / tiger
A Tight-fisted Optimizer
☆48Updated 2 years ago
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
☆22Updated last year
GeeeekExplorer / transformers-patch
patches for huggingface transformers to save memory
☆26Updated last month
CyndxAI / QKNorm
Code for the paper "Query-Key Normalization for Transformers"
☆43Updated 4 years ago
yuzhenmao / IceFormer
Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).
☆25Updated last year
sharonzhou / ICLR2021-Stats
ICLR 2021 Stats & Graphs
☆31Updated 3 years ago
bojone / analytical-classification
逻辑回归和单层softmax的解析解
☆12Updated 3 years ago
Ascend / MindSpeed-RL
☆27Updated this week
vedaldi / micro_llama
A tiny, didactical implementation of LLAMA 3
☆41Updated 7 months ago
megvii-research / basedet
An object detection codebase based on MegEngine.
☆28Updated 2 years ago
SkyworkAI / MindLink
☆53Updated this week
MAC-AutoML / YOCO-BERT
The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Natu…
☆48Updated 4 years ago
QipengGuo / NLP-Notes
Notes of my introduction about NLP in Fudan University
☆37Updated 4 years ago
BBuf / flash-rwkv
☆31Updated last year
RUCAIBox / QuantizedEmpirical
☆14Updated last year
ag1988 / top_k_attention
The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonatha…
☆69Updated 3 years ago
DACUS1995 / pytorch-mmap-dataset
A custom pytorch Dataset extension that provides a faster iteration and better RAM usage
☆44Updated last year
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆85Updated 8 months ago
jiahe7ay / infini-mini-transformer
This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…
☆58Updated last year
kyegomez / Reka-Torch
Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch
☆30Updated 2 weeks ago
yikangshen / megablocks
☆20Updated last year
jungokasai / T2R
☆14Updated 2 years ago
lucasjinreal / wnnx_models
Various test models in WNNX format. It can view with `pip install wnetron && wnetron`
☆12Updated 3 years ago
NonvolatileMemory / flash_attn_gqa
triton ver of gqa flash attn, based on the tutorial
☆11Updated 11 months ago