llm-jp/text2dataset

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/llm-jp/text2dataset)

llm-jp / text2dataset

Easily turn large English text datasets into Japanese text datasets using open LLMs.

☆30

Alternatives and similar repositories for text2dataset

Users that are interested in text2dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yuyay / DEIM2022_XAI_tutorial
View on GitHub
☆12Feb 27, 2022Updated 4 years ago
opensource-jp / Open-Source-AI
View on GitHub
Japanese translation of Open Source AI Definition
☆27Nov 15, 2024Updated last year
okoge-kaz / moe-recipes
View on GitHub
Ongoing research training Mixture of Expert models.
☆22Sep 16, 2024Updated last year
abap34 / JITrench.jl
View on GitHub
[wip] Lightweight Automatic Differentiation & DeepLearning Framework implemented in pure Julia.
☆30Feb 29, 2024Updated 2 years ago
ujiuji1259 / shinra-attribute-extraction
View on GitHub
☆11Sep 7, 2021Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
kunishou / oasst1-89k-ja
View on GitHub
☆16Nov 19, 2023Updated 2 years ago
ymd-h / vulkpy
View on GitHub
GPGPU array on Vulkan
☆17Jun 3, 2023Updated 3 years ago
speed1313 / jax-llm
View on GitHub
JAX implementation of Large Language Models. You can train GPT-2-like model with 青空文庫 (aozora bunko-clean dataset) or any other text dat…
☆13Aug 5, 2024Updated last year
hotchpotch / JQaRA
View on GitHub
JQaRA: Japanese Question Answering with Retrieval Augmentation - 検索拡張(RAG)評価のための日本語Q&Aデータセット
☆44Sep 9, 2025Updated 10 months ago
UEC-InabaLab / KokoroChat
View on GitHub
ロールプレイで収集した日本語のカウンセリング対話データセット
☆23May 3, 2026Updated 2 months ago
kaisugi / spotify-wordcloud
View on GitHub
Visualize, share, and keep your favorite music artists on the Web
☆14May 23, 2023Updated 3 years ago
line / japanese-large-lm-instruction-sft
View on GitHub
☆16Aug 14, 2023Updated 2 years ago
laboroai / Laboro-ParaCorpus
View on GitHub
Scripts for creating a Japanese-English parallel corpus and training NMT models
☆19Nov 9, 2021Updated 4 years ago
yuzu-ai / japanese-llm-ranking
View on GitHub
☆50Apr 10, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
llm-jp / llm-jp-eval
View on GitHub
☆164Updated this week
stockmarkteam / ner-wikipedia-dataset
View on GitHub
Wikipediaを用いた日本語の固有表現抽出データセット
☆143Sep 2, 2023Updated 2 years ago
Qulacs-Osaka / scikit-qulacs
View on GitHub
scikit-qulacs is a library for quantum neural network. This library is based on qulacs and named after scikit-learn.
☆25Updated this week
shisa-ai / shisa-v2
View on GitHub
Japanese / English Bilingual LLM
☆34Dec 23, 2025Updated 7 months ago
de9uch1 / semsis
View on GitHub
A library for semantic similarity search
☆26Jan 31, 2025Updated last year
kunishou / GenerativeAI-Cost
View on GitHub
☆16Jan 3, 2025Updated last year
yoichi1484 / subspace
View on GitHub
An implementation of "Subspace Representations for Soft Set Operations and Sentence Similarities" (NAACL 2024)
☆10May 31, 2024Updated 2 years ago
nu-dialogue / real-persona-chat
View on GitHub
RealPersonaChat: A Realistic Persona Chat Corpus with Interlocutors' Own Personalities
☆66Mar 13, 2024Updated 2 years ago
hppRC / simple-simcse-ja
View on GitHub
Exploring Japanese SimCSE
☆69Oct 31, 2023Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
AUGMXNT / shisa
View on GitHub
☆43Mar 30, 2024Updated 2 years ago
studio-ousia / ease
View on GitHub
☆57Jun 3, 2023Updated 3 years ago
SakanaAI / EDINET-Bench
View on GitHub
[ICLR 2026] Evaluating the performance of LLMs on Japanese challenging financial tasks.
☆34Mar 6, 2026Updated 4 months ago
maekawatoshiki / altius
View on GitHub
Small ONNX inference runtime written in Rust
☆102Feb 6, 2026Updated 5 months ago
hppRC / llm-translator
View on GitHub
Mixtral-based Ja-En (En-Ja) Translation model
☆20Jan 6, 2025Updated last year
TsuyoshiOkubo / Introduction-to-Tensor-Network
View on GitHub
☆26May 27, 2025Updated last year
softdevteam / libgc
View on GitHub
A library for garbage collection in Rust.
☆14Apr 23, 2021Updated 5 years ago
luismede / netherite.nvim
View on GitHub
A Neovim plugin for quick notes that sync with your Obsidian vault. Write fast, sync seamlessly.
☆15Jul 15, 2026Updated last week
nerab / sony-camera-remote
View on GitHub
Ruby wrapper for the Sony Camera Remote API
☆10Feb 2, 2014Updated 12 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
miyagawa / resque-top
View on GitHub
top for Resque
☆23Mar 5, 2012Updated 14 years ago
megagonlabs / holobench
View on GitHub
🫧 Code for Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data (Maekawa*, Iso* et al.…
☆12Feb 25, 2025Updated last year
yahoojapan / JGLUE
View on GitHub
JGLUE: Japanese General Language Understanding Evaluation
☆346Mar 31, 2025Updated last year
primenumber / kazoeage-oneesan-cuda
View on GitHub
GPGPU version of 数え上げお姉さん(https://github.com/primenumber/kazoeage-oneesan)
☆11Dec 3, 2021Updated 4 years ago
Aratako / Japanese-RP-Bench
View on GitHub
☆19Sep 29, 2024Updated last year
turingmotors / heron
View on GitHub
Heron is a library that seamlessly integrates multiple Vision and Language models, as well as Video and Language models.
☆177Jun 13, 2024Updated 2 years ago
hitachi-nlp / FLD-corpus
View on GitHub
☆19Dec 6, 2024Updated last year