Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code
☆64Feb 8, 2025Updated last year
Alternatives and similar repositories for top-open-subtitles-sentences
Users that are interested in top-open-subtitles-sentences are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Simple word to frequency mappings for the german language based on text corpora and using CISTEM stemmer.☆14Apr 3, 2021Updated 5 years ago
- temporary files created by opensubtitles-scraper☆17Feb 3, 2026Updated 4 months ago
- Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code☆111Aug 14, 2023Updated 2 years ago
- A Nix Flake for SumatraPDF☆14Mar 12, 2026Updated 3 months ago
- Arabic Grapheme-to-Phoneme (G2P) Conversion☆16Mar 15, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Practical example from Human-in-the-Loop Machine Learning book☆11Oct 28, 2021Updated 4 years ago
- Cross platform local yomichan/yomitan server to play audio (without Anki)☆13Nov 16, 2025Updated 7 months ago
- Climate Data Rescue is an archival data rescue platform using Ruby on Rails.☆15Mar 10, 2026Updated 3 months ago
- An Anki plugin to sort your new cards.☆25Dec 16, 2025Updated 6 months ago
- Base16 tomorrow night stylus theme for gogs☆10Jan 15, 2024Updated 2 years ago
- ☆18Oct 27, 2025Updated 8 months ago
- @DHRI-Curriculum Session on text analysis with NLTK, including discussion of cleaning data, creating text corpora, and analyzing texts pr…☆11May 13, 2021Updated 5 years ago
- Extract plain text from Arabic Wikipedia dumps.☆13Jun 15, 2014Updated 12 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP research☆34Dec 8, 2022Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Beautiful animated SVG or GIF kanji from KanjiVG data set.☆73Jul 16, 2016Updated 9 years ago
- The Arabic Error Type Annotation tool aims to annotate Arabic error types following the ALC tagset annotation.☆11Oct 28, 2022Updated 3 years ago
- English conversation corpus for conversational TTS.☆21Mar 13, 2023Updated 3 years ago
- A Docker version of Learning Analytics as a Service (LAaaS)☆11Feb 15, 2024Updated 2 years ago
- Anthy is a kana-kanji conversion engine for Japanese. It converts roma-ji to kana, and the kana text to a mixed kana and kanji. Merge Deb…☆16Feb 25, 2023Updated 3 years ago
- CLI tool for discovering related base domains using WhoisXMLAPI's reverse Whois endpoints☆12Jun 15, 2024Updated 2 years ago
- Tools for calculating psycholinguistically-relevant metrics of language statistics using transformer language models☆13Nov 11, 2022Updated 3 years ago
- Visual Hash for matching copies of visually similar images.☆16Mar 17, 2025Updated last year
- Unveiling Cyber Threats: From assets to Vulnerability Insights☆18Oct 22, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Materials for "Prompting is not a substitute for probability measurements in large language models" (EMNLP 2023)☆24Oct 24, 2023Updated 2 years ago
- Global ASP - African Storybook Project for the World☆18Jun 8, 2026Updated 3 weeks ago
- Text to speech REST API for multiple TTS engines☆36Mar 19, 2024Updated 2 years ago
- ☆18Sep 10, 2021Updated 4 years ago
- A simple bug bounty utility tool to remove uninteresting entries from a list of URLs.☆13Jul 22, 2024Updated last year
- simple kv store for streams☆36Mar 14, 2013Updated 13 years ago
- A BugBounty playbook covering vulnerability bypasses, payloads, and quick checks for OWASP Top 10 + extras.☆23Sep 29, 2025Updated 9 months ago
- ☆15Aug 30, 2021Updated 4 years ago
- Zurich Morphological Lexicon for German: a tool to extract a morphological lexicon from Wiktionary☆12Aug 10, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A pytest plugin for testing Anki add-ons☆34Apr 10, 2025Updated last year
- vertical search crawler☆38Jan 9, 2012Updated 14 years ago
- statically generated weekly digest of articles read in Pocket☆10May 14, 2019Updated 7 years ago
- bookmarklet readability using mozilla version of readabilty☆14Apr 6, 2022Updated 4 years ago
- A modular URL deduplication tool.☆19Feb 19, 2025Updated last year
- DataReaper is a powerful Python tool designed to harvest data from publicly accessible HTTP servers. It combines the capabilities of Shod…☆16May 14, 2026Updated last month
- A stylesheet based on Richard Rutter's book Web Typography.☆10Dec 6, 2018Updated 7 years ago