google-research-datasets / Hinglish-TOP-Dataset

Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentation technique. Queries are derived from TOPv2, a multi-domain task oriented semantic parsing dataset. Tests suggest that with CST5, up to 20x less labeled data can achieve the same semantic parsing performance.
33Updated last year

Related projects

Alternatives and complementary repositories for Hinglish-TOP-Dataset