rwightman / genalogLinks
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
ā44Updated 2 years ago
Alternatives and similar repositories for genalog
Users that are interested in genalog are comparing it to the libraries listed below
Sorting:
- A library for squeakily cleaning and filtering language datasets.ā49Updated 2 years ago
- š¤ Trade any tensors over the networkā30Updated 2 years ago
- QLoRA for Masked Language Modelingā22Updated 2 years ago
- Code for NeurIPS LLM Efficiency Challengeā60Updated last year
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pā¦ā35Updated 2 years ago
- QLoRA with Enhanced Multi GPU Supportā37Updated 2 years ago
- minimal pytorch implementation of bm25 (with sparse tensors)ā104Updated 3 months ago
- ML/DL Math and Method notesā66Updated 2 years ago
- ā22Updated 2 years ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.ā160Updated last year
- experiments with inference on llamaā103Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.ā32Updated 4 months ago
- NLP with Rust for Python š¦šā70Updated 8 months ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.ā67Updated last week
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/ā¦ā28Updated last year
- Using short models to classify long textsā21Updated 2 years ago
- ā47Updated last year
- A place to store reusable transformer components of my own creation or found on the interwebsā70Updated 2 weeks ago
- Seemless interface of using PyTOrch distributed with Jupyter notebooksā57Updated 4 months ago
- Various handy scripts to quickly setup new Linux and Windows sandboxes, containers and WSL.ā40Updated last week
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.ā27Updated 2 years ago
- ā53Updated 11 months ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.ā96Updated 2 years ago
- Multi-Domain Expert Learningā67Updated 2 years ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorerā45Updated last year
- Utilities for Training Very Large Modelsā58Updated last year
- Simple GRPO scripts and configurations.ā59Updated 11 months ago
- ā94Updated 2 years ago
- Repository containing the SPIN experiments on the DIBT 10k ranked promptsā23Updated last year
- Pre-train Static Word Embeddingsā94Updated 4 months ago