ENOT-AutoDL / gpt-j-6B-tensorrt-int8
GPT-J 6B inference on TensorRT with INT-8 precision
☆11Updated last year
Alternatives and similar repositories for gpt-j-6B-tensorrt-int8:
Users that are interested in gpt-j-6B-tensorrt-int8 are comparing it to the libraries listed below
- sigma-MoE layer☆18Updated last year
- Truly flash T5 realization!☆63Updated 9 months ago
- High performance pytorch modules☆18Updated 2 years ago
- My explorations into editing the knowledge and memories of an attention network☆34Updated 2 years ago
- Running massive simulations using RNNs on CPUs for building bots and all kinds of things.☆13Updated 3 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated last year
- Implements the SM3-II adaptive optimization algorithm for PyTorch.☆33Updated 5 months ago
- A boilerplate to use multiprocessing for your gRPC server in your Python project☆25Updated 3 years ago
- Index of URLs to pdf files all over the internet and scripts☆21Updated last year
- A dashboard for exploring timm learning rate schedulers☆19Updated 2 months ago
- c++ mosestokenizer☆17Updated 11 months ago
- Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch☆59Updated 4 years ago
- Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch☆72Updated 2 years ago
- A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training☆26Updated 2 years ago
- ☆57Updated last year
- This repository contains example code to build models on TPUs☆30Updated 2 years ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- [JMLR'20] NeurIPS 2019 MicroNet Challenge Efficient Language Modeling, Champion☆40Updated 3 years ago
- Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI☆57Updated last year
- Memory-efficient transformer. Work in progress.☆19Updated 2 years ago
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆82Updated 3 weeks ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆15Updated 3 months ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 6 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆25Updated 10 months ago
- ☆18Updated 8 months ago
- Various transformers for FSDP research☆36Updated 2 years ago
- ☆13Updated 6 months ago
- TorchServe+Streamlit for easily serving your HuggingFace NER models☆32Updated 2 years ago
- Experiments for XLM-V Transformers Integeration☆13Updated 2 years ago
- Checkpointable dataset utilities for foundation model training☆32Updated last year