中英文语料数据清洗及分布式分句分词预处理工作
☆12Mar 28, 2020Updated 5 years ago
Alternatives and similar repositories for dataProcessor
Users that are interested in dataProcessor are comparing it to the libraries listed below
Sorting:
- 使用Simhash对海量文本进行去重☆12Jun 2, 2018Updated 7 years ago
- Source code for ACL 2020 paper "A Span-based Linearization for Constituent Trees"☆13Jan 12, 2022Updated 4 years ago
- 云南大学选课爬虫,提供余课提醒服务,实现了自动抢课☆21Jan 19, 2026Updated 2 months ago
- Implement ARM NEON intrinsics in C++☆22May 14, 2024Updated last year
- 基于SpringAi及SpringAiAlibaba实现Rag检索增强生成个人知识库问答系统☆20May 10, 2025Updated 10 months ago
- 本项目是一个基于 Java Spring Boot 和 Vue3 的全栈 AI 智能体应用平台,集成了大模型对话、RAG 知识库、智能体自主规划、工具链调用、MCP 服务等多项前沿 AI 技术。平台支持多轮对话、知识检索、自动化任务执行等功能,适用于 AI 应用开发、智能助…☆23Jun 26, 2025Updated 8 months ago
- The paper list of multilingual pre-trained models (Continual Updated).☆24Jun 18, 2024Updated last year
- this repo is mnbvc text quality classification using fastText☆16Oct 2, 2023Updated 2 years ago
- auto push daily news with ai☆13Updated this week
- a free clash subscribe url☆12Apr 22, 2022Updated 3 years ago
- Scrapy + selenium/webdriver + 随机User-Agent + IP proxy + twisted ConnectionPool + mysql 爬取某书整站爬虫☆15Dec 8, 2022Updated 3 years ago
- Code for "A Unified Model for Joint Chinese Word Segmentation and Dependency Parsing"☆39May 24, 2022Updated 3 years ago
- 爬取知网页面的文献信息,并存在Excel内☆19Jan 7, 2019Updated 7 years ago
- SDL2# - C# Wrapper for SDL2☆48Feb 12, 2025Updated last year
- Finetune multiple pre-trained Transformer-based models to solve Vietnamese Fake News Detection problem (ReINTEL) in VLSP2020 shared task☆18Dec 16, 2020Updated 5 years ago
- TXT文本语料数据清洗(Text corpus data cleaning):1> 合并TXT文件;2> 过滤干扰字符串;3> 对人名、地名、组织机构进行遮码处理;4> 将其他编码格式统一转换为UTF-8☆19Oct 14, 2022Updated 3 years ago
- 小型图书信息管理系统-个人开发☆10Jan 15, 2019Updated 7 years ago
- Cube-UI demo for nuxt.js appication☆14Jun 27, 2019Updated 6 years ago
- 必背:英语口语8000句☆11Jul 21, 2022Updated 3 years ago
- 基于 Spring Boot 3.4.5 + Java 21 + Spring Al + LangChain4j + DashScope + Ollama 的智能AI交互系统🤖实战项目,适用于AI应用开发、智能体构建等场景。项目从基础模型调用到RAG知识库问答📚再到工具…☆67Jul 7, 2025Updated 8 months ago
- telegram bot for quickly downloading from anna's archive☆11Dec 5, 2022Updated 3 years ago
- A tiny script to convert your mdx dictionary file to CSV☆11Dec 22, 2018Updated 7 years ago
- 遗传算法,解决函数极值问题☆10Aug 7, 2016Updated 9 years ago
- ☆23Dec 6, 2018Updated 7 years ago
- 粤语双拼输入法 Input method for typing Chinese using Cantonese pronunciations with 2-3 keys per character, based on RIME☆11Jul 25, 2021Updated 4 years ago
- An Android dictionary application with support for mdx format.☆11Jan 7, 2023Updated 3 years ago
- 2019语言与智能技术竞赛第5名方案☆14Dec 2, 2019Updated 6 years ago
- VnDT: A Vietnamese Dependency Treebank☆24Nov 6, 2021Updated 4 years ago
- 通过Golang开发一个简单的公链☆13Oct 9, 2018Updated 7 years ago
- Automatically add explanations of unfamiliar words in ebooks☆15Feb 9, 2023Updated 3 years ago
- Source code for "Training Generative Adversarial Networks Via Turing Test".☆13May 29, 2020Updated 5 years ago
- An English dictionary.☆10May 31, 2016Updated 9 years ago
- 早期的计算机使用7位的ASCII编码,为了处理汉字,程序员设计了用于简体中文的GB2312和用于繁体中文的big5。 GB2312(1980年)一共收录了7445个字符,包括6763个汉字和682个其它符号。汉字区的内码范围高字节从B0-F7,低字节从A1-FE,占用的码…☆10Sep 10, 2017Updated 8 years ago
- Proteogenomics database-generation tool for protein haplotypes and variants☆11Jun 19, 2025Updated 9 months ago
- 用于学习和磨练自己的小项目:SSM图书管理系统☆17Jun 14, 2019Updated 6 years ago
- ☆18Updated this week
- 基于IK中文分词器,添加同义词功能☆13Feb 24, 2018Updated 8 years ago
- A searchable Chinese / English dictionary with helpful utilities.☆12Feb 24, 2024Updated 2 years ago
- A text file containing English words, along with the definition, parts of speech (noun,verb,adjective,etc.), and a link to the url where …☆13Apr 27, 2024Updated last year