sharejing/Takin

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sharejing/Takin)

sharejing / Takin

A Python toolkit for file processing, text cleaning and data splitting. 文件处理，文本清洗和数据划分的python工具包。

☆36

Alternatives and similar repositories for Takin

Users that are interested in Takin are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

terryqj0107 / GECOR
View on GitHub
Source code and dataset for the paper "GECOR: An End-to-End Generative Ellipsis and Co-reference Resolution Model for Task-Oriented Dialo…
☆30Jul 22, 2023Updated 3 years ago
adetion / txtfilemerge
View on GitHub
TXT文本语料数据清洗（Text corpus data cleaning）：1> 合并TXT文件；2> 过滤干扰字符串；3> 对人名、地名、组织机构进行遮码处理；4> 将其他编码格式统一转换为UTF-8
☆19Oct 14, 2022Updated 3 years ago
pany8125 / ShareGPTQAExtractor-mnbvc
View on GitHub
MNBVC项目-ShareGPT语料清洗
☆16Oct 4, 2023Updated 2 years ago
zejunwang1 / gpt2ppl-zh
View on GitHub
基于中文 GPT2 预训练模型的语句困惑度计算
☆15Apr 20, 2023Updated 3 years ago
hscspring / pnlp
View on GitHub
NLP预/后处理工具。
☆30Mar 31, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
xueyouluo / wiki-error-extract
View on GitHub
根据维基百科历史编辑数据提取纠错语料。
☆12Apr 6, 2022Updated 4 years ago
chloyee / FilmReviewAnalysis
View on GitHub
使用scrapy框架爬取豆瓣影评，利用python对数据进行清洗分析，最后进行可视化
☆15Sep 5, 2020Updated 5 years ago
jason-r-becker / LIBS
View on GitHub
Analysis codes for Laser-Induced Breakdown Spectroscopy data
☆10Aug 19, 2017Updated 8 years ago
vishwajeet93 / clqg
View on GitHub
code for ACL 2019 paper "cross lingual training for automatic question generation"
☆14Jun 30, 2019Updated 7 years ago
HeartQiann / weibo_visualization
View on GitHub
微博舆情与用户行为可视化平台
☆23Mar 27, 2023Updated 3 years ago
hiDaDeng / Tool_Kits
View on GitHub
涵盖网络爬虫、数据库、数据分析、机器学习、可视化、文本分析、GUI、自动化办公
☆14Jan 14, 2022Updated 4 years ago
yuchenlin / ParaGEN
View on GitHub
Neural Paraphrase Generation based on OpenNMT-py
☆12Jan 2, 2018Updated 8 years ago
Feuoy / weibo-topic
View on GitHub
微博话题简单分析，话题爬取、高频词获取、词云生成、情感值获取，python + selenium + jieba + snownlp + wordcloud
☆33Jan 28, 2021Updated 5 years ago
Miopas / dstc_rnn
View on GitHub
implement a RNN model of DSTC2 task
☆16Jan 25, 2019Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
niro8 / weibo_crawler
View on GitHub
基于关键词搜索结果的微博爬虫
☆31Nov 6, 2018Updated 7 years ago
kevinduh / san_mrc
View on GitHub
Stochastic Answer Networks (SAN) for Machine Reading Comprehension
☆148Nov 26, 2018Updated 7 years ago
baoy-nlp / FAParser
View on GitHub
A Fast(er) and Accurate Syntactic Parsing by Exacter Searching.
☆17Jul 25, 2024Updated last year
AtmaHou / Bi-LSTM_PosTagger
View on GitHub
An easy-to-use sequence labeling project(get SoA on ATIS data) with pytorch
☆15Nov 21, 2018Updated 7 years ago
JayYip / deep-learning-nlp-notes
View on GitHub
深度学习和NLP随笔
☆27Jun 17, 2019Updated 7 years ago
elnaaz / GCE-Model
View on GitHub
Toward Scalable Neural Dialogue State Tracking Model
☆20Sep 23, 2022Updated 3 years ago
PhantomGrapes / MGeoExample
View on GitHub
Examples about using MGeo finetune models
☆57Feb 9, 2023Updated 3 years ago
lemonsis / MDD-5k
View on GitHub
Official Implementation of AAAI 2025 paper "MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symb…
☆52Dec 8, 2025Updated 7 months ago
beikwx / sailVina_Linux
View on GitHub
sailVina用于Linux的反向对接脚本
☆10Feb 14, 2021Updated 5 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
mbalesni / deepspeed_llama
View on GitHub
Finetuning LLaMA with DeepSpeed
☆10Apr 14, 2023Updated 3 years ago
NaturalCutie / Python-Data-Analysis-Notes
View on GitHub
基于B站 @林粒粒呀老师Python数据分析课程的笔记，包括Python基础知识，以及数据读取、评估、清洗、分析、可视化等内容
☆52Jul 6, 2024Updated 2 years ago
NLP-Tutorials / AACL-IJCNLP2022-KGC-Tutorial
View on GitHub
Materials for AACL-IJCNLP-2022 tutorial: Efficient and Robust Knowledge Graph Construction
☆28Feb 3, 2023Updated 3 years ago
James-Yip / TGODC-DKRN
View on GitHub
The source code of the paper 'Dynamic Knowledge Routing Network For Target-Guided Open-Domain Conversation'
☆24Mar 24, 2023Updated 3 years ago
LiamAttClarke / svg-plotter
View on GitHub
Convert SVG files to GeoJSON
☆11Feb 17, 2026Updated 5 months ago
Asthestarsfalll / Sparse_MultiLabel_Categorical_CrossEntropy
View on GitHub
Sparse Multilabel Categorical Crossentropy
☆11Sep 10, 2023Updated 2 years ago
tranhungnghiep / AnalyzeKGE
View on GitHub
Analyzing knowledge graph embedding methods, including TransE, DistMult, CP, SimplE, ComplEx, Quaternion
☆28May 23, 2023Updated 3 years ago
colin4k / mistral-ocr-app
View on GitHub
Python code & Cloudflare worker for Mistral-OCR
☆12Mar 8, 2025Updated last year
momo-journey / CDial-GPT-NEZHA
View on GitHub
pytorch版基于gpt+nezha的中文多轮Cdial
☆11Oct 22, 2022Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
waylandzhang / train_tokenizer
View on GitHub
A demonstration of how to train a custom tokenizer similar to TikToken.
☆15Jan 6, 2025Updated last year
rbawden / DiaBLa-dataset
View on GitHub
English-French MT dialogue dataset
☆17Apr 29, 2022Updated 4 years ago
yl-wang996 / ToolEENet
View on GitHub
ToolEENet: Tool Affordance 6D Pose Estimation
☆12Jun 29, 2024Updated 2 years ago
freesunshine0316 / MPQG
View on GitHub
Code corresponding to our paper "Leveraging Context Information for Natural Question Generation"
☆46Oct 8, 2019Updated 6 years ago
Shikib / structured_fusion_networks
View on GitHub
Code for SIGDial 2019 Best Paper: Structured Fusion Networks for Dialog https://arxiv.org/abs/1907.10016
☆30Aug 19, 2019Updated 6 years ago
aniket0511 / Sigmoid-Function
View on GitHub
Hardware Implementation of Sigmoid Function using verilog HDL
☆16Dec 16, 2019Updated 6 years ago
whxf / awesome-chinese-nlp
View on GitHub
本项目整合了常用中文nlp资源，包括：工具、数据、学习资源和常用模型。
☆34Dec 11, 2019Updated 6 years ago