Extracts plain text, language identification and more metadata from WARC records
☆23Oct 1, 2025Updated 5 months ago
Alternatives and similar repositories for warc2text
Users that are interested in warc2text are comparing it to the libraries listed below
Sorting:
- ☆19Sep 16, 2025Updated 5 months ago
- Hosts text-to-speech corpus and speech synthesizers for African languages.☆18May 31, 2023Updated 2 years ago
- Library for fast text representation and classification.☆31Jan 9, 2024Updated 2 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆34Sep 4, 2025Updated 5 months ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Jun 18, 2024Updated last year
- Targetted language identifier, based on FastText and Hunspell.☆38Sep 4, 2025Updated 5 months ago
- Tools for content datamining and NLP at scale☆44Jun 20, 2024Updated last year
- Hidden Engrams: Long Term Memory for Transformer Model Inference☆35Jun 26, 2021Updated 4 years ago
- ParaNames: A multilingual resource for parallel names☆39May 20, 2024Updated last year
- GUI for GHRepoSearcher. It allows to search online repositories on github.☆10May 20, 2022Updated 3 years ago
- Minangkabau NLP corpus. PACLIC 2020☆10Jun 7, 2021Updated 4 years ago
- Maintenance Information Extraction (MaintIE)☆16Jun 29, 2024Updated last year
- Visual tool for SPARQL queries on graphol graphs☆10Oct 3, 2018Updated 7 years ago
- ☆12Updated this week
- Creating super-parallel corpora of more than 1500+ unique languages for NLP research☆34Dec 8, 2022Updated 3 years ago
- Qt/Qml application using Google speech-to-text API to make voice commands☆11Jan 19, 2020Updated 6 years ago
- A Reactive Sparql Client written in Scala and Akka☆13Sep 18, 2023Updated 2 years ago
- ☆11Feb 24, 2022Updated 4 years ago
- Terminal tool that converts files encoding to UTF-8☆10Oct 5, 2019Updated 6 years ago
- Quickly run SchemaSpy on a database and serve the results☆10Mar 24, 2021Updated 4 years ago
- A basic DNN tutorial in PyTorch, for persons without a background in Linux, Python, or remote servers☆10Apr 2, 2020Updated 5 years ago
- Spanish text summarization demo using CoreNLP☆10Sep 13, 2014Updated 11 years ago
- A tool to build custom application simulators through declarative configuration☆11Dec 15, 2025Updated 2 months ago
- Simple single file header for creating zero imports drivers. Can be useful for bypassing forensic memory analysis performed by anticheats…☆16Jun 10, 2025Updated 8 months ago
- ☆10May 28, 2022Updated 3 years ago
- LLM inference in C/C++☆24Updated this week
- A Font with extensive coverage of Unicode13 as of March 2020 (part of Unicode Fonts for Ancient Scripts)☆15Mar 26, 2020Updated 5 years ago
- Implementation of W3C's R2RML and Direct Mapping specifications☆10Oct 12, 2020Updated 5 years ago
- Lossless normalization of uppercase characters☆11Jul 3, 2023Updated 2 years ago
- MOVIO - Online Virtual Exhibitions☆15Nov 23, 2020Updated 5 years ago
- C++ parser to read data from MATLAB .mat files☆10Oct 12, 2014Updated 11 years ago
- ChatGPT solutions for the MLE interview☆14Dec 9, 2022Updated 3 years ago
- A simple code generator of JSON marshaler for go and tinygo.☆10Feb 9, 2026Updated 2 weeks ago
- An abstract, safe, and concise color conversion library for rust nightly This requires the feature adt_const_params☆12Nov 18, 2022Updated 3 years ago
- Tunisian Arabish Corpus☆11Mar 12, 2024Updated last year
- ☆15Mar 4, 2017Updated 8 years ago
- Very small standalone full-text search HTTP/SCGI server☆14Jun 11, 2025Updated 8 months ago
- CKAN extension for data.world☆12Dec 5, 2023Updated 2 years ago
- Make Metabase More Awesome☆16Jul 24, 2024Updated last year