allenai/smashed

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/allenai/smashed)

allenai / smashed

SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.

☆35

Alternatives and similar repositories for smashed

Users that are interested in smashed are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

amazon-science / wqa-multi-sentence-inference
View on GitHub
This repository contains code used for our Multi Sentence Inference NAACL'22 paper.
☆12Mar 6, 2023Updated 3 years ago
prohandler / GS-Bulk-Emails
View on GitHub
Google App Scripts that sends a number of emails from the specific number and that tracks the open status of each email
☆17Dec 11, 2024Updated last year
clinicalml / cotrain-prompting
View on GitHub
Code for co-training large language models (e.g. T0) with smaller ones (e.g. BERT) to boost few-shot performance
☆16Sep 23, 2022Updated 3 years ago
multilexsum / dataset
View on GitHub
Multi-LexSum is an abstractive summarization dataset for US Civil Rights Lawsuits
☆23Dec 15, 2022Updated 3 years ago
ryannair05 / Tempus-Romanum
View on GitHub
Show the time in Roman Numerals
☆12Jan 23, 2020Updated 6 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
allenai / mmda
View on GitHub
multimodal document analysis
☆166May 14, 2026Updated 2 months ago
himkt / allennlp-optuna
View on GitHub
⚡️ AllenNLP plugin for adding subcommands to use Optuna, making hyperparameter optimization easy
☆33Nov 23, 2021Updated 4 years ago
allenai / mslr-shared-task
View on GitHub
Multidocument Summarization for Literature Review Shared Task 2022
☆30Oct 16, 2022Updated 3 years ago
iKernels / transformers-lightning
View on GitHub
A collection of Models, Datasets, DataModules, Callbacks, Metrics, Losses and Loggers to better integrate pytorch-lightning with transfor…
☆47May 29, 2023Updated 3 years ago
mscarey / legislice
View on GitHub
API client for fetching and comparing passages from legislation
☆14Jun 29, 2026Updated 3 weeks ago
JSv4 / AtticusClassifier
View on GitHub
Trained BERT and Word2Vec legal clause classifiers for SPACY using the Atticus Project's Open Source Contract Label Corpus
☆14Jan 2, 2021Updated 5 years ago
allenai / neural-wire-viz
View on GitHub
Javascript library for visualizing dynamic neural networks across time.
☆13Dec 9, 2019Updated 6 years ago
viking-sudo-rm / rusty-dawg
View on GitHub
Rust library for indexing and quickly searching large pretraining corpora
☆31Oct 30, 2025Updated 8 months ago
mscarey / justopinion
View on GitHub
Download client for legal opinions
☆13Jun 12, 2026Updated last month
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
richarddwang / hugdatafast
View on GitHub
The elegant integration of huggingface/nlp and fastai2 and handy transforms using pure huggingface/nlp
☆19Oct 6, 2020Updated 5 years ago
allenai / scidocs
View on GitHub
Dataset accompanying the SPECTER model
☆148Dec 19, 2022Updated 3 years ago
bettyblocks / material-ui-component-set
View on GitHub
A Betty Blocks Component Set based on Material UI
☆25May 21, 2026Updated 2 months ago
ruiqi-zhong / DescribeDistributionalDifferences
View on GitHub
Code for preprint: Summarizing Differences between Text Distributions with Natural Language
☆43Feb 24, 2023Updated 3 years ago
allenai / vila
View on GitHub
Incorporating VIsual LAyout Structures for Scientific Text Classification
☆180Mar 18, 2023Updated 3 years ago
facebookresearch / CCQA
View on GitHub
CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training
☆33Jul 20, 2022Updated 4 years ago
allenai / beaker-gantry
View on GitHub
Gantry provides an API that streamlines running experiments in Beaker
☆31Updated this week
AIPHES / DiscoScore
View on GitHub
DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence
☆37Jul 25, 2023Updated 2 years ago
allenai / allennlp-simple-server-visualization
View on GitHub
static-dir files for a simple-server demo with ReactJS visualizations
☆16Nov 28, 2018Updated 7 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
allefeld / atom-pdfjs-viewer
View on GitHub
Themed, fully featured PDF viewer for the Atom editor
☆12Jan 28, 2026Updated 5 months ago
tleyden / open-ocr-client
View on GitHub
Client library for OpenOCR
☆32Dec 3, 2014Updated 11 years ago
ibm-hyperknowledge / hkpy
View on GitHub
A Python module to provide software abstractions to ease accessing hyperknowledge graphs
☆11Dec 19, 2024Updated last year
nstawfik / MedSentEval
View on GitHub
☆11Nov 19, 2020Updated 5 years ago
unitedstates / BillMap
View on GitHub
Utilities and applications for the FlatGov project by Demand Progress
☆17Feb 8, 2023Updated 3 years ago
krassowski / plotnine3d
View on GitHub
3D geoms for plotnine (grammar of graphics in Python)
☆13Aug 5, 2022Updated 3 years ago
accordproject / concerto-codegen
View on GitHub
☆13Jul 14, 2026Updated last week
kachayev / dataclasses-tensor
View on GitHub
Easily serialize dataclasses to and from tensors (PyTorch, NumPy)
☆18Apr 10, 2021Updated 5 years ago
rclement / datasette-ml
View on GitHub
A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models
☆17Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
iliaschalkidis / flash-roberta
View on GitHub
Hugging Face RoBERTa with Flash Attention 2
☆24Sep 14, 2025Updated 10 months ago
allenai / tango
View on GitHub
Organize your experiments into discrete steps that can be cached and reused throughout the lifetime of your research project.
☆572May 30, 2024Updated 2 years ago
iamtrask / python-paillier
View on GitHub
A library for Partially Homomorphic Encryption in Python
☆12May 30, 2017Updated 9 years ago
allenai / scirepeval
View on GitHub
SciRepEval benchmark training and evaluation scripts
☆89May 5, 2026Updated 2 months ago
ICLRandD / LegalHackers2019
View on GitHub
This repository contains materials for the Open Legal Data Forum at the Legal Hacker 2019 (September 2019 + Brooklyn, NYC)
☆17Dec 8, 2022Updated 3 years ago
HazyResearch / tabi
View on GitHub
Code release for Type-Aware Bi-Encoders for Open-Domain Entity Retrieval
☆19Sep 24, 2022Updated 3 years ago
unicamp-dl / InRanker
View on GitHub
☆47Feb 7, 2024Updated 2 years ago