Textprep is an analyzing tool for both parallel and non-parallel corpus and its down-stream Natural Language Processing and Machine Translation tasks. It is designed especially for logographic languages such as Chinese and Japanese.
☆32Feb 25, 2019Updated 7 years ago
Alternatives and similar repositories for textprep
Users that are interested in textprep are comparing it to the libraries listed below
Sorting:
- We use phonetics as a feature to create a joint semantic-phonetic embedding and improve the neural machine translation between Chinese an…☆12Aug 3, 2021Updated 4 years ago
- BERT models for many languages created from Wikipedia texts☆33May 25, 2020Updated 5 years ago
- Django+echarts+py2neo进行知识图谱的前端展示☆16Oct 13, 2020Updated 5 years ago
- Re-implementation of Word2Vec using Tensorflow v2 Estimators and Datasets☆11Mar 25, 2023Updated 2 years ago
- Syntactic evaluation sets, attribute-varying grammars, and code for replicating the CLAMS paper. ACL 2020.☆17Nov 26, 2024Updated last year
- Cynical data selection☆20Jan 16, 2021Updated 5 years ago
- Resources for the OpenNMT hackathon☆51May 24, 2019Updated 6 years ago
- Deep Inside-Outside Recursive Autoencoder☆89Jan 17, 2022Updated 4 years ago
- ☆22Oct 26, 2020Updated 5 years ago
- benchmarks for LLM tokenizers☆17Jan 15, 2026Updated last month
- code for COLING paper "A Hybrid Model of Classification and Generation for Spatial Relation Extraction"☆10Oct 20, 2022Updated 3 years ago
- Experimenting with GANs in Tensorflow/Keras☆10Jan 13, 2022Updated 4 years ago
- Simple, beautiful discussion forums - for customer support, news aggregation, QA sites, and online communities.☆56Dec 9, 2012Updated 13 years ago
- Benchmark functions for Bayesian optimization☆37Mar 12, 2024Updated last year
- ☆34Nov 29, 2016Updated 9 years ago
- Dependency or Span, End-to-End Uniform Semantic Role Labeling☆32Nov 23, 2018Updated 7 years ago
- Multi-lingual & multi-domain (specialisation for biomedical data) translation model☆40Nov 17, 2020Updated 5 years ago
- A High-Quality Multilingual Dataset for Structured Documentation Translation☆37May 1, 2025Updated 10 months ago
- A visual and interactive scoring environment for machine translation systems.☆32May 30, 2018Updated 7 years ago
- ☆12Dec 19, 2023Updated 2 years ago
- 서울시 민원 데이터 자동 분류 분석가이드(서울디지털재단)☆12Apr 3, 2021Updated 4 years ago
- A responsive & browser compatible video player☆53Apr 18, 2017Updated 8 years ago
- PyTorch code for the EMNLP 2020 paper "Embedding Words in Non-Vector Space with Unsupervised Graph Learning"☆41Feb 18, 2021Updated 5 years ago
- Computing various norms/measures on over-parametrized neural networks☆50Nov 26, 2018Updated 7 years ago
- ☆11Feb 22, 2022Updated 4 years ago
- Read, write and manipulate code which reads, writes and manipulates code.☆10Mar 15, 2020Updated 5 years ago
- ☆21May 28, 2024Updated last year
- LaTeX beamer template in corporate design of University of Amsterdam☆13Dec 7, 2015Updated 10 years ago
- Interact with remote git checkouts using Fork, and more!☆12Oct 22, 2024Updated last year
- Tensorflow Operation Wrapper of cppjieba (Chinese Word Segamentation)☆10Oct 21, 2019Updated 6 years ago
- A tool to get pretty girls images in your command line☆28Aug 20, 2014Updated 11 years ago
- ☆12Nov 30, 2022Updated 3 years ago
- Large-scale topic discovery with Sampled-MinHashing☆10Jul 3, 2019Updated 6 years ago
- Data Generator for Training Tesseract OCR☆10Jul 7, 2020Updated 5 years ago
- Stream torrents to VLC using Peerflix and torrent using your terminal☆10Feb 15, 2018Updated 8 years ago
- Benchmarks for Business Document Foundation Models☆10Apr 4, 2024Updated last year
- Package to parse and analyze trademark data from the United States Patent and Trademark Office☆14Apr 5, 2017Updated 8 years ago
- ☆10Aug 13, 2012Updated 13 years ago
- ☆11Aug 12, 2020Updated 5 years ago