A part-of-speech tagger with support for domain adaptation and external resources.
☆24Oct 26, 2022Updated 3 years ago
Alternatives and similar repositories for SoMeWeTa
Users that are interested in SoMeWeTa are comparing it to the libraries listed below
Sorting:
- A tokenizer and sentence splitter for German and English web and social media texts.☆153Dec 9, 2024Updated last year
- Deutschsprachige Einführung in die automatisierte Inhaltsanalyse mit R.☆17Sep 11, 2020Updated 5 years ago
- ☆14May 20, 2019Updated 6 years ago
- Compound splitter for German☆112Apr 5, 2020Updated 5 years ago
- German lemmatization with IWNLP as extension for spaCy☆26Jul 28, 2023Updated 2 years ago
- Named Entity Recognition (LSTM + CRF + FastText) with models for [historic] German☆26May 10, 2021Updated 4 years ago
- annotated hateful speech☆24Apr 6, 2019Updated 6 years ago
- convert DataFrame to libffm data format in parallel☆30Apr 12, 2018Updated 7 years ago
- Writing Observer and Learning Observer: A system for monitoring learning process data, with an initial focus on writing process data from…☆12Feb 24, 2026Updated last week
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆35Aug 15, 2023Updated 2 years ago
- This repository contains all manually labeled data from the GermEval-2018 shared task.☆29Sep 28, 2018Updated 7 years ago
- Using the function read.table() to break file into chunks to loop and process them. This allows processing files of any size beyond what …☆10Aug 19, 2014Updated 11 years ago
- материалы курса по питону для студентов дпо-программы "компьютерная лингвистика" в НИУ ВШЭ (2020-2021)☆11Feb 21, 2022Updated 4 years ago
- ☆10Jun 24, 2020Updated 5 years ago
- ☆10Jul 6, 2023Updated 2 years ago
- mReasoner is a unified computational implementation of the model theory of thinking and reasoning☆13Aug 17, 2023Updated 2 years ago
- german sentiment analysis☆13Mar 8, 2017Updated 8 years ago
- Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Textual Style Transfer☆36Oct 2, 2022Updated 3 years ago
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆39Mar 8, 2022Updated 3 years ago
- Code that drives the public web-based tools for the Media Cloud Online News Archive and Directory.☆11Updated this week
- ☆13Apr 24, 2023Updated 2 years ago
- ☆11Jan 27, 2026Updated last month
- Automatic Detection of Potentially Idiomatic Expressions☆12Feb 19, 2021Updated 5 years ago
- A short demo of (r)Ollama☆11Oct 17, 2024Updated last year
- Containerfile for the Vanilla OS Desktop+Nvidia image.☆16Feb 5, 2026Updated 3 weeks ago
- Fake NEWS detector using LIAR dataset.☆11Aug 19, 2019Updated 6 years ago
- Vossian Antonomasia☆10Oct 17, 2025Updated 4 months ago
- ☆11Mar 31, 2023Updated 2 years ago
- C4RepSet: Representative Subset from C4 data for Training Pre-trained LMs☆11Jan 13, 2023Updated 3 years ago
- Security research organization dedicated to finding low hanging, critical, vulnerabilities.☆15May 12, 2022Updated 3 years ago
- Code and data for the Walert large language model-based chatbot☆12Aug 14, 2025Updated 6 months ago
- Twitter Dataset and Finetuned Transformer Model for Turkish Sentiment Analysis☆14Jul 29, 2022Updated 3 years ago
- ☆12Jun 29, 2025Updated 8 months ago
- ☆10Jan 5, 2022Updated 4 years ago
- TREC Core track☆11Jul 5, 2017Updated 8 years ago
- synchronous and asynchronous event based c++ executor libray☆13Sep 25, 2016Updated 9 years ago
- Platform for sharing complex information about security forces. Powers WhoWasInCommand.com☆10Mar 1, 2024Updated 2 years ago
- APIs for accessing digital objects in the collections of the Royal Danish Library☆11Mar 14, 2023Updated 2 years ago
- ☆11Feb 13, 2026Updated 2 weeks ago