qsyao / cudaBERT
A Fast Muti-processing BERT-Inference System
☆100Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for cudaBERT
- InsNet Runs Instance-dependent Neural Networks with Padding-free Dynamic Batching.☆66Updated 3 years ago
- pytorch源码阅读 0.2.0 版本☆88Updated 4 years ago
- OneFlow models for benchmarking.☆104Updated 3 months ago
- Place for meetup slides☆140Updated 4 years ago
- Tutorial code on how to build your own Deep Learning System in 2k Lines☆126Updated 7 years ago
- oneflow documentation☆68Updated 4 months ago
- Simple Dynamic Batching Inference☆145Updated 2 years ago
- DeepLearning Framework Performance Profiling Toolkit☆277Updated 2 years ago
- ☆51Updated last year
- Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL☆528Updated 4 years ago
- ☆123Updated 3 years ago
- ☆209Updated last year
- A small deep-learning framework with C++/Python/CUDA☆53Updated 6 years ago
- tensorflow源码阅读笔记☆189Updated 6 years ago
- Multi-gpu/distributed training script in Tensorflow 1.x.☆17Updated 5 years ago
- AutodiffEngine☆13Updated 5 years ago
- A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …☆104Updated 11 months ago
- Running BERT without Padding☆460Updated 2 years ago
- A distributed logistic regression system based on ps-lite.☆45Updated 7 years ago
- C++ model train&inference framework☆223Updated 4 years ago
- (Spring 2018) Assignment 2: Graph Executor with TVM☆124Updated 6 years ago
- Transformer related optimization, including BERT, GPT☆60Updated last year
- A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster☆157Updated 7 months ago
- TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.☆90Updated last year
- Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.☆263Updated last year
- implement bert in pure c++☆32Updated 4 years ago
- notes on reading tensorflow source code☆13Updated 6 years ago
- an automatic differentiation framework with dynamic graph/支持动态图的自动求导框架☆101Updated 4 years ago
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆133Updated 2 months ago
- Models and examples built with OneFlow☆96Updated last month