euirim/goodwiki

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/euirim/goodwiki)

euirim / goodwiki

Package and scripts used to build a dataset of Wikipedia articles in Markdown.

☆20

Alternatives and similar repositories for goodwiki

Users that are interested in goodwiki are comparing it to the libraries listed below

Sorting:

ctlllll / reward_collapse
View on GitHub
☆26May 30, 2023Updated 2 years ago
CarperAI / squeakily
View on GitHub
A library for squeakily cleaning and filtering language datasets.
☆50Jul 10, 2023Updated 2 years ago
BryanLunduke / Netiquette2020
View on GitHub
Network Etiquette (Netiquette) -- Written with 2020 technology in mind
☆10Nov 19, 2021Updated 4 years ago
SeanSdahl / PytorchDataloaderForTensorflow
View on GitHub
This repository defines a python class that can be used to load data for the tf.keras.model.fit_generator function by using a torch.utils…
☆11Oct 26, 2024Updated last year
allenai / bff
View on GitHub
☆38Apr 17, 2024Updated last year
oscar-project / ungoliant
View on GitHub
The pipeline for the OSCAR corpus
☆176Nov 9, 2025Updated 3 months ago
antonio-f / BERT_from_scratch
View on GitHub
Training a BERT model from scratch.
☆11Oct 15, 2023Updated 2 years ago
evintunador / auto-video-editing-suite
View on GitHub
All the tools that allow me to never ever open up Final Cut
☆11Feb 16, 2025Updated last year
moznion / go-json-ice
View on GitHub
A simple code generator of JSON marshaler for go and tinygo.
☆10Feb 9, 2026Updated 3 weeks ago
iesl / s-diora
View on GitHub
☆12Jan 29, 2021Updated 5 years ago
hccngu / DialCoT
View on GitHub
DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models
☆13Nov 2, 2023Updated 2 years ago
louisowen6 / quora_paraphrasing_id
View on GitHub
Quora Paraphrasing Dataset Bahasa Indonesia Version
☆11Apr 18, 2021Updated 4 years ago
zhuyunqi96 / LoraLPrun
View on GitHub
☆13May 21, 2023Updated 2 years ago
kztakemoto / mmllm
View on GitHub
Moral Machine Experiment on LLMs
☆11Updated this week
ir-nlp-csui / indo-law
View on GitHub
Indonesian law dataset containing section annotation of court decision documents
☆17Jul 7, 2022Updated 3 years ago
recursal / minmodmon
View on GitHub
Mini Model Daemon
☆12Nov 9, 2024Updated last year
QuasarBrains / Simple-Agent
View on GitHub
A simple agent powered by LLMs that performs tasks.
☆13Apr 25, 2025Updated 10 months ago
TimDaub / react-envelope-graph
View on GitHub
A drag-and-drop-enabled, responsive, envelope graph that allows to shape a wave with attack, decay, sustain and release
☆11Jan 5, 2023Updated 3 years ago
sade-adrien / SteloCoder
View on GitHub
☆16Dec 21, 2023Updated 2 years ago
hugcis / evolving-structures-in-complex-systems
View on GitHub
Dataset and code to reproduce the results of the paper "Evolving Structures in Complex Systems"
☆11Dec 16, 2019Updated 6 years ago
uzushino / little-annoy
View on GitHub
A minimal implementation of spotify/annoy in pure rust
☆11Mar 2, 2023Updated 3 years ago
hengwang322 / sponsor_investigator
View on GitHub
☆10May 28, 2022Updated 3 years ago
rian-dolphin / fasthtml-chat
View on GitHub
A chat implementation for FastHTML
☆11Sep 14, 2025Updated 5 months ago
Troglio / troglio
View on GitHub
Turn Trello into a CMS to power all your websites and apps.
☆10May 12, 2018Updated 7 years ago
iantbutler01 / ditty
View on GitHub
A library for simplifying training with multi gpu setups in the HuggingFace / PyTorch ecosystem.
☆16Jan 9, 2026Updated last month
JustlyAI / lmss_entity_extractor
View on GitHub
Tool to apply Legal Matter Specification Standard (LMSS) to documents
☆12Aug 15, 2024Updated last year
rrkarim / unbounded-cache-lm
View on GitHub
Unbounded cache model for online language modeling with open vocabulary
☆11Feb 15, 2019Updated 7 years ago
ferenci-tamas / vonat-keses
View on GitHub
Vonatkésési statisztika
☆20Updated this week
Helsinki-NLP / OPUS-MT-testsets
View on GitHub
benchmarks for evaluating MT models
☆11Jun 26, 2024Updated last year
EhsanMashhadi / ISSRE2023-BugSeverityPrediction
View on GitHub
Code of our paper "Method-Level Bug Severity Prediction using Source Code Metrics and LLMs" which is accepted to ISSRE 2023.
☆10Nov 12, 2023Updated 2 years ago
JJGO / thunderpack
View on GitHub
☆11Sep 8, 2024Updated last year
OppaiCyber / XppaiWRT
View on GitHub
Telegram bot framework written in PHP for OpenWRT
☆12Nov 27, 2022Updated 3 years ago
ariaghora / villard
View on GitHub
A pipeline framework for data science projects
☆10Aug 9, 2022Updated 3 years ago
davidbrochart / ipyhtmx
View on GitHub
Build modern UIs in Jupyter with Python
☆12Dec 28, 2022Updated 3 years ago
matlab-deep-learning / quantization-aware-training
View on GitHub
This example shows how to perform quantization aware training for transfer learned MobileNet-v2 network.
☆12Dec 19, 2023Updated 2 years ago
ekurtulus / tied-augment
View on GitHub
Tied-Augment: Controlling Representation Similarity Improves Data Augmentation
☆14Oct 1, 2023Updated 2 years ago
dragnet-org / dragnet_data
View on GitHub
code and data used to build a training dataset for dragnet models
☆10Nov 29, 2020Updated 5 years ago
rosbit / duktape-bridge
View on GitHub
A very easy-to-use wrapper of Duktape JavaScript engine, including wrappers for C, Go and Java. The bridge wrapper is also supporting mo…
☆14Dec 20, 2021Updated 4 years ago
noanabeshima / github-downloader
View on GitHub
Script for downloading GitHub.
☆13Sep 24, 2020Updated 5 years ago