This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/Transformers.
☆96Mar 11, 2024Updated 2 years ago
Alternatives and similar repositories for transformers-stream-generator
Users that are interested in transformers-stream-generator are comparing it to the libraries listed below
Sorting:
- ☆13Aug 23, 2024Updated last year
- Official Repository for Paper "BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Compet…☆18Aug 28, 2024Updated last year
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- Sampling-Based Minimum Bayes-Risk Decoding for Neural Machine Translation☆16Oct 14, 2022Updated 3 years ago
- accelerate generating vector by using onnx model☆18Jan 23, 2024Updated 2 years ago
- A library for data streaming and augmentation☆21May 5, 2025Updated 10 months ago
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆40Nov 11, 2024Updated last year
- ☆25Jul 12, 2017Updated 8 years ago
- ⚡ boost inference speed of GPT models in transformers by onnxruntime☆52Aug 20, 2023Updated 2 years ago
- See details in https://github.com/pytorch/xla/blob/r1.12/torch_xla/distributed/fsdp/README.md☆25Dec 22, 2022Updated 3 years ago
- ☆90Jul 4, 2024Updated last year
- 基于鼠标键盘操作的微信自动聊天机器人☆13Nov 26, 2024Updated last year
- ☆65Apr 27, 2024Updated last year
- Transformer related optimization, including BERT, GPT☆6,397Mar 27, 2024Updated last year
- A terminal dashboard for Pipecat☆41Updated this week
- ☆12Apr 24, 2024Updated last year
- A CLI tool to convert JSON Resume schema to RenderCV schema☆20Mar 11, 2025Updated last year
- 极简的代码,来实现Agent skills调度,并支持千人千面的用户画像。☆30Updated this week
- TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs☆24Sep 21, 2025Updated 5 months ago
- An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.☆28Apr 15, 2025Updated 11 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆141Dec 6, 2024Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Jul 28, 2024Updated last year
- Large language Model fintuning bloom , opt , gpt, gpt2 ,llama,llama-2,cpmant and so on☆100Apr 24, 2024Updated last year
- Chinese CLIP models with SOTA performance.☆60Aug 28, 2023Updated 2 years ago
- A tiny server to run local inference on MLX model in the style of OpenAI☆13Jan 31, 2024Updated 2 years ago
- 纯c++的全平台llm加速库,支持python调用,支持chatglm-6B, llama, baichuan, moss基座,x86 / ARM☆12Jan 30, 2026Updated last month
- Accelerating GOT-OCRv2 with VLLM☆11Nov 15, 2024Updated last year
- [NeurIPS 2025] An official source code for paper "L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models"☆24Oct 29, 2025Updated 4 months ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆5,034Apr 11, 2025Updated 11 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆2,101Jun 30, 2025Updated 8 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,322Mar 6, 2025Updated last year
- ☆15Jun 12, 2023Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆11Sep 4, 2025Updated 6 months ago
- Cairo graphics Library on XCode for iOS☆13Sep 30, 2016Updated 9 years ago
- ☆16Mar 6, 2026Updated 2 weeks ago
- Backup your Docker Volumes☆17May 16, 2023Updated 2 years ago
- Reverse engineered ChatGPT API☆10Feb 14, 2023Updated 3 years ago
- 用ATSS训练自己的目标检测模型!! 超详细教程和PDF教程下载!!!☆10Jul 28, 2020Updated 5 years ago
- Gated Pretrained Transformer model for robust denoised sequence-to-sequence modelling☆10May 29, 2021Updated 4 years ago