KamWithK / PyParquetLoaders
Easy, efficient and Pythonic data loading of Parquet files for PyTorch-based libraries
☆24Updated 4 years ago
Alternatives and similar repositories for PyParquetLoaders:
Users that are interested in PyParquetLoaders are comparing it to the libraries listed below
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆37Updated 3 years ago
- ☆21Updated 3 years ago
- A collection of Models, Datasets, DataModules, Callbacks, Metrics, Losses and Loggers to better integrate pytorch-lightning with transfor…☆47Updated last year
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆32Updated 10 months ago
- My explorations into editing the knowledge and memories of an attention network☆34Updated 2 years ago
- Transformers at any scale☆41Updated last year
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆48Updated 3 years ago
- Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorch☆12Updated 3 years ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 2 years ago
- ☆32Updated 2 years ago
- Parallel data preprocessing for NLP and ML.☆34Updated 4 months ago
- Implementation of OpenAI paper with Simple Noise Scale on Fastai V2☆19Updated 3 years ago
- High performance pytorch modules☆18Updated 2 years ago
- Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch☆45Updated 4 years ago
- Automate issue discovery for your projects against Lightning nightly and releases.☆46Updated last week
- A python library for highly configurable transformers - easing model architecture search and experimentation.☆49Updated 3 years ago
- Repository for Multimodal AutoML Benchmark☆65Updated 3 years ago
- Embedding Recycling for Language models☆38Updated last year
- Retrieval with Learned Similarities (http://arxiv.org/abs/2407.15462, WWW'25 Oral)☆41Updated last month
- Axial Positional Embedding for Pytorch☆76Updated last month
- ☆15Updated 3 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- ☆16Updated 2 months ago
- Bi-encoder entity linking architecture☆44Updated 6 months ago
- Exploring finetuning public checkpoints on filter 8K sequences on Pile☆115Updated 2 years ago
- Learning to Rank in PyTorch☆81Updated last year
- A case study of efficient training of large language models using commodity hardware.☆69Updated 2 years ago
- AdamW optimizer for bfloat16 models in pytorch 🔥.☆32Updated 9 months ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆48Updated last week
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆18Updated last month