Datasets

Various datasets and dataset tools for training a model

HuggingFace Datasets
Various community datasets.

Wikipedia Datasets
Datasets made by scraping Wikipedia. These contain embeddings and metadata.

Augment Toolkit
Allows you to generate chatbot training data.

Data Juicer
For refining data.