fuzihaofzh / cstlLinks
The C++ Standard Template Library (STL) for Python.
☆24Updated 2 years ago
Alternatives and similar repositories for cstl
Users that are interested in cstl are comparing it to the libraries listed below
Sorting:
- Demystify RAM Usage in Multi-Process Data Loaders☆201Updated 2 years ago
- Prune a model while finetuning or training.☆405Updated 3 years ago
- Implementation of a Transformer, but completely in Triton☆274Updated 3 years ago
- A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to fac…☆240Updated this week
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆159Updated last year
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆214Updated 2 years ago
- ☆119Updated last year
- Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint☆409Updated last year
- See details in https://github.com/pytorch/xla/blob/r1.12/torch_xla/distributed/fsdp/README.md☆25Updated 2 years ago
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…☆205Updated last year
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆120Updated last year
- Memory-Efficient CUDA kernels for training ConvNets with PyTorch.☆42Updated 7 months ago
- Torch Distributed Experimental☆117Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆224Updated last year
- A library for unit scaling in PyTorch☆130Updated 2 months ago
- PyTorch code for hierarchical k-means -- a data curation method for self-supervised learning☆189Updated last year
- Low-bit optimizers for PyTorch☆131Updated last year
- Slicing a PyTorch Tensor Into Parallel Shards☆300Updated 3 months ago
- ☆183Updated 11 months ago
- Code release for "Dropout Reduces Underfitting"☆313Updated 2 years ago
- A lightweight library designed to accelerate the process of training PyTorch models by providing a minimal, but extensible training loop …☆190Updated 3 months ago
- A general and accurate MACs / FLOPs profiler for PyTorch models☆629Updated last month
- ☆159Updated 2 years ago
- Megatron's multi-modal data loader☆243Updated 3 weeks ago
- Root Mean Square Layer Normalization☆254Updated 2 years ago
- # Unified Normalization (ACM MM'22) By Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, and Shiliang P…☆34Updated 2 years ago
- Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption☆107Updated 2 years ago
- Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically s…☆142Updated last year
- Python pdb for multiple processes☆58Updated 4 months ago
- EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning (ACL 2023)☆31Updated 2 years ago