Use multi-threaded crawler to crawl the idiom data
☆14Dec 11, 2020Updated 5 years ago
Alternatives and similar repositories for Crawl
Users that are interested in Crawl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- python class for elasticsearch , including add, batch add, update, delete, query, and scan query. also with a demo that put Wikipedia in…☆17Sep 3, 2022Updated 3 years ago
- A based-bert baseline for Chinese idiom cloze test with pytorch.☆18Dec 24, 2020Updated 5 years ago
- tf-idf 模型封装类,包含计算所有文档的tf-idf值,实现了基于tf-idf搜索引擎功能。根据query,计算与每个文档的相似度,返回与query相似度最高的topk文档☆16Nov 20, 2020Updated 5 years ago
- semantic similarity, word2vec + wmd, bert+wmd, pytorch☆31Jan 29, 2024Updated 2 years ago
- Datafountain-Epidemic government affairs quiz assistant competition. We divided this task into two parts: document retrieval and answer e…☆14Aug 21, 2022Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- DataFountain 疫情政务问答助手解决方案分享☆16May 2, 2020Updated 5 years ago
- 文档记录☆15Mar 16, 2021Updated 5 years ago
- Implementation of AAAI2021 paper "Writing Polishment with Simile: Task, Dataset and A Neural Approach"☆21Dec 25, 2020Updated 5 years ago
- Moss Vortex is a lightweight and high-performance deployment and inference backend engineered specifically for MOSS 003, providing a weal…☆37Apr 25, 2023Updated 2 years ago
- A Specialist-annotated Dataset for Medical-domain Chinese Spelling Correction☆36Jun 6, 2022Updated 3 years ago
- Reference Implementation for WSDM 2018 Paper "Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering"☆68Nov 16, 2018Updated 7 years ago
- [AAAI 2024] LLMEval Phase II dataset — professional domain evaluation across 12 academic disciplines☆71Updated this week
- This is the dataset for Chinese community medical question answering.☆114Oct 22, 2019Updated 6 years ago
- 科赛网-莱斯杯:全国第二届“军事智能机器阅读”挑战赛 前十团队PPT文档代码总结☆132Feb 5, 2020Updated 6 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ChID: A Large-scale Chinese IDiom Dataset for Cloze Test☆150May 8, 2023Updated 2 years ago
- Neural word segmentation with rich pretraining, code for ACL 2017 paper☆164Jan 10, 2019Updated 7 years ago
- TensorFlow code and pre-trained models for BERT and ERNIE☆146Jun 5, 2019Updated 6 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Jun 18, 2024Updated last year
- Dynamic Memory Networks (https://arxiv.org/abs/1603.01417) in Tensorflow☆239Aug 10, 2016Updated 9 years ago
- ☆344Dec 11, 2018Updated 7 years ago
- Naive Bayes-based Context Extension☆328Dec 9, 2024Updated last year
- A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".☆305Aug 24, 2022Updated 3 years ago
- This is updated version of the dataset for Chinese community medical question answering.☆383Jan 9, 2019Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 以词为基本单位的中文BERT☆477Nov 18, 2021Updated 4 years ago
- Reject complicated operations for incorporating lexicon for Chinese NER.☆437Jan 22, 2022Updated 4 years ago
- A prize for finding tasks that cause large language models to show inverse scaling☆618Oct 11, 2023Updated 2 years ago
- MuCGEC中文纠错数据集及文本纠错SOTA模型开源;Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Gr…☆566Jun 9, 2023Updated 2 years ago
- An implementation of TransE and its extended models for Knowledge Representation Learning on TensorFlow☆513Nov 3, 2022Updated 3 years ago
- XVERSE-13B: A multilingual large language model developed by XVERSE Technology Inc.☆642Apr 9, 2024Updated 2 years ago
- 📦 快速转化「中文数字」和「阿拉伯数字」~ (最新特性:分数,日期、温度等转化)☆758Dec 21, 2024Updated last year
- A full Python Implementation of the ROUGE Metric (not a wrapper)☆718Nov 19, 2024Updated last year
- Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".☆878Aug 20, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is design…☆1,112Jul 30, 2021Updated 4 years ago
- Four word embedding models implemented in Python. Supporting arbitrary context features☆848Aug 22, 2019Updated 6 years ago
- 收录NLP竞赛策略实现、各任务baseline、相关竞赛经验贴(当前赛事、往期赛事、训练赛)、NLP会议时间、常用自媒体、GPU推荐等,持续更新中☆2,243Aug 29, 2023Updated 2 years ago
- 深度学习面试问题 回答对应的DeepLearning中文版页码☆879Nov 2, 2017Updated 8 years ago
- Must-read papers on Machine Reading Comprehension☆890Jul 9, 2020Updated 5 years ago
- Image Test Time Augmentation with PyTorch!☆1,028Jul 28, 2023Updated 2 years ago
- A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)☆1,146Jan 4, 2024Updated 2 years ago