Collection of scripts to pretrain T5 in unsupervised text, using PyTorch Lightning. CORD-19 pretraining provided as example.
☆32Apr 26, 2021Updated 4 years ago
Alternatives and similar repositories for Pretraining-T5-PyTorch-Lightning
Users that are interested in Pretraining-T5-PyTorch-Lightning are comparing it to the libraries listed below
Sorting:
- Continue Pretraining T5 on custom dataset based on available pretrained model checkpoints☆38Mar 21, 2021Updated 4 years ago
- ☆22Nov 25, 2021Updated 4 years ago
- ☆13Jun 19, 2021Updated 4 years ago
- Generating artificial disfluencies from fluent text easily and promptly☆15Sep 28, 2022Updated 3 years ago
- The official repository for Dynamic Clustering and Cluster Contrastive Learning (DCCC).☆14Dec 15, 2023Updated 2 years ago
- ☆13Oct 21, 2021Updated 4 years ago
- 简单的挖矿病毒查杀脚本☆19Apr 4, 2022Updated 3 years ago
- ☆45Sep 12, 2021Updated 4 years ago
- Hugging Face RoBERTa with Flash Attention 2☆24Sep 14, 2025Updated 5 months ago
- ☆23Feb 6, 2022Updated 4 years ago
- [ACL25] FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation☆46Jan 28, 2026Updated last month
- Few-shot Learning with Auxiliary Data☆31Dec 8, 2023Updated 2 years ago
- Sentence tokenizer for clinical/medical text.☆28Jun 3, 2024Updated last year
- CoditT5: Pretraining for Source Code and Natural Language Editing☆28Jan 16, 2025Updated last year
- ☆26Aug 14, 2022Updated 3 years ago
- BLOOM 模型的指令微调☆24Jun 15, 2023Updated 2 years ago
- Source code for "Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models", ICLR 2020.☆30Jun 28, 2020Updated 5 years ago
- A simple example for finetuning HuggingFace T5 model. Includes code for intermediate generation.☆26Nov 11, 2020Updated 5 years ago
- FUSION is an open-source project aimed at revolutionizing networking through the simulation of advanced SD-EONs and AI-enhanced networks,…☆13Feb 18, 2026Updated 2 weeks ago
- Winning solution for the Kaggle Feedback Prize Challenge.☆66Sep 5, 2022Updated 3 years ago
- 数据合成工具,简单高效的合成不同业务场景的大模型训练数据☆41Jan 2, 2025Updated last year
- Artifact code release for paper "Uniform-Cost Multi-Path Routing for Reconfigurable Data Center Networks"☆12Sep 5, 2024Updated last year
- Arabic News Stance Corpus☆11Feb 5, 2021Updated 5 years ago
- ☆34Oct 30, 2020Updated 5 years ago
- CFBench: A Comprehensive Constraints-Following Benchmark for LLMs☆48Aug 26, 2024Updated last year
- 天池 新冠疫情相似句对判定大赛 top6方案☆77Jun 22, 2022Updated 3 years ago
- Token-free Language Modeling with ByGPT5 & Friends!☆12Jul 18, 2025Updated 7 months ago
- Direction of arrival (DOA) estimation is a fundamental problem in array signal processing with applications spanning radar, sonar, wirele…☆26Sep 1, 2025Updated 6 months ago
- Code Roberta version of RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder☆10Mar 16, 2023Updated 2 years ago
- The official implementation of the paper "Text Classification in the Wild: a Large-scale Long-tailed Name Normalization Dataset"(ICASSP 2…☆12Feb 19, 2023Updated 3 years ago
- python programs and procedures that facilitate local application of the earth2observe global water resources reanalysis☆10Nov 21, 2017Updated 8 years ago
- 机器学习(Machine Learning)、深度学习(Deep Learning)、对抗神经网络(GAN),图神经网络(GNN),NLP,大数据相关的发展路书(roadmap), 并附海量源码(python,pytorch)带大家消化基本知识点,突破面试,完成从新手到合格…☆10Feb 25, 2020Updated 6 years ago
- Human ID classification using mmwave radar point cloud☆13Oct 18, 2025Updated 4 months ago
- The codes for our ACL'22 paper: PRBOOST: Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning.☆35Mar 18, 2022Updated 3 years ago
- ☆16Jul 20, 2025Updated 7 months ago
- ☆14Jul 5, 2023Updated 2 years ago
- pytorch版损失函数,改写自科学空间文章,【通过互信息思想来缓解类别不平衡问题】、【将“softmax+交叉熵”推广到多标签分类问题】☆12Aug 22, 2021Updated 4 years ago
- Structural displacement monitoring using ground-based synthetic aperture radar: Implementation of 3D displacement vector☆14Mar 6, 2024Updated 2 years ago
- ☆10May 1, 2025Updated 10 months ago