mrjleo/boilernet

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mrjleo/boilernet)

mrjleo / boilernet

Boilerplate Removal using Deep Learning

☆83

Alternatives and similar repositories for boilernet

Users that are interested in boilernet are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dalab / web2text
View on GitHub
Source code for the paper "Web2Text: Deep Structured Boilerplate Removal", full paper @ ECIR'18
☆169Oct 28, 2021Updated 4 years ago
rsling / texrex
View on GitHub
texrex web page cleaning & ClaraX random walk crawler
☆11Dec 13, 2021Updated 4 years ago
seanmacavaney / autoqrels
View on GitHub
☆15Feb 20, 2025Updated last year
jsinger67 / Lelek
View on GitHub
F# LL(k) Parser generator.
☆12Oct 26, 2022Updated 3 years ago
MohamedHmini / iww
View on GitHub
AI based web-wrapper for web-content-extraction
☆102Feb 6, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
capreolus-ir / diffir
View on GitHub
Tool for comparing two ranked lists (TREC run files)
☆20Nov 9, 2022Updated 3 years ago
colmex / frontera_example
View on GitHub
Example frontera project
☆12Aug 10, 2017Updated 8 years ago
tmu-nlp / paraphrase-corpus
View on GitHub
Tokyo Metropolitan University Paraphrase Corpus (TMUP)
☆11Jun 12, 2017Updated 9 years ago
miso-belica / jusText
View on GitHub
Heuristic based boilerplate removal tool
☆818Feb 25, 2025Updated last year
a5huynh / scrapyd-playground
View on GitHub
Get started with scrapy and scrapyd
☆12Mar 3, 2015Updated 11 years ago
HazyResearch / random_embedding
View on GitHub
☆15Jun 10, 2022Updated 4 years ago
informagi / GeeseDB
View on GitHub
Graph Engine for Exploration and Search
☆42Jan 26, 2024Updated 2 years ago
KOBA789 / isunarabe-images
View on GitHub
ISUNARABE の練習 VM 用イメージをビルドするためのパイプライン
☆11Mar 20, 2024Updated 2 years ago
osirrc / osirrc2019-library
View on GitHub
Official library of images for the SIGIR 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)
☆13Jul 7, 2019Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
namin / linkrev
View on GitHub
☆15Aug 15, 2012Updated 13 years ago
cstrahan / obsidian-requirejs
View on GitHub
Write reusable JavaScript functions using Asynchronous Module Definitions
☆11May 18, 2022Updated 4 years ago
MurtuzaBohra / SimpDOM
View on GitHub
Simplified DOM Trees for Transferable Attribute Extraction from the Web
☆43Sep 27, 2024Updated last year
ir-anthology / ir-anthology-old
View on GitHub
Software for building the IR Anthology.
☆11Sep 19, 2023Updated 2 years ago
irgroup / repro_eval
View on GitHub
A Python Interface to Reproducibility Measures of System-Oriented IR Experiments
☆11Dec 2, 2025Updated 7 months ago
kms9 / learn_rag_by_rag
View on GitHub
使用rag来学习rag
☆10Sep 6, 2024Updated last year
rankbiased / rbstar
View on GitHub
Rank-Biased Precision, Overlap, Recall, and Alignment
☆12Jun 15, 2026Updated 3 weeks ago
usnistgov / trec-browser
View on GitHub
Metadata browser of TREC
☆10May 19, 2026Updated last month
c32168 / dyntamic
View on GitHub
Generate pydantic models from JSON Schema
☆24Sep 19, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
citiususc / pyplexity
View on GitHub
Cleaning tool for web scraped text
☆40Jun 7, 2023Updated 3 years ago
imc-trading / telerista
View on GitHub
☆17Apr 4, 2020Updated 6 years ago
maxnth / LineAug
View on GitHub
Augment line images for improving OCR datasets
☆10Oct 4, 2023Updated 2 years ago
chauff / readingGroup
View on GitHub
Overview of IR/NLP papers covered in my team's reading group.
☆10May 5, 2020Updated 6 years ago
blookot / elastic-gdpr-scanner
View on GitHub
Scan Elasticsearch instances to check for GDPR compliance
☆14May 22, 2025Updated last year
osirrc / jig
View on GitHub
Jig for the Open-Source IR Replicability Challenge (OSIRRC)
☆13Dec 8, 2022Updated 3 years ago
AlexGidiotis / Advanced-ML-techniques
View on GitHub
This repo contains implementation of advanced ML techniques. Includes model ensembles, cost-sensitive learning and dealing with class imb…
☆18Jun 13, 2018Updated 8 years ago
astronomer / airflow-testing-skeleton
View on GitHub
A skeleton project for testing Airflow code
☆20Sep 17, 2021Updated 4 years ago
xrr233 / Webformer
View on GitHub
SIGIR-2022 Webformer: Pre-training with Web Pages for Information Retrieval
☆50Sep 20, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
rickardp / bitsandbytes
View on GitHub
8-bit CUDA functions for PyTorch
☆18Dec 23, 2024Updated last year
peidaqi / chiaki
View on GitHub
Free and Open Source PS4 Remote Play Client
☆14Jul 16, 2022Updated 3 years ago
ishiko732 / WordSearch
View on GitHub
对词典进行解析单词的含义,提供Anki的Fast Words Query插件词库
☆12Apr 1, 2021Updated 5 years ago
H-TayyarMadabushi / Cost-Sensitive_Bert_and_Transformers
View on GitHub
Transformers for Cost-Sensitive BERT for Generalisable Sentence Classification on Imbalanced Data
☆18May 28, 2020Updated 6 years ago
xuanzebi / NER-PyTorch
View on GitHub
记录自己用的BILSTM-CRF、ELMo、BERT等来做NER任务的代码。
☆26Feb 6, 2020Updated 6 years ago
baharev / sdopt-tearing
View on GitHub
Exact and heuristic methods for tearing
☆13Sep 2, 2023Updated 2 years ago
X-LANCE / WebSRC-Baseline
View on GitHub
[EMNLP 2021] The baseline code for WebSRC dataset.
☆51Apr 2, 2025Updated last year