You can find some links to different places where you can discover a variety of existing datasets.
Machine Learning Dataset CollectionsWe start with a list of machine learning dataset repositories.
UCI Machine Learning Dataset Repository | [HTML] | A repository of over 100 datasets used in ML research. |
SNAP: Stanford Large Network Dataset Collection | [HTML] | A curated collection of social network and other graph data |
Kaggle | [HTML] | A collection of open source datasets used for data science |
KDNuggets | [HTML] | An aggregator of dataset repositories |
Datasets.co | [HTML] | A small collection of well-known datasets |
KDD Cup Datasets | [HTML] | Datasets used in competitions run but the KDD (Knowledge Discovery in Data) conference |
Data Sets for data mining | [HTML] | A collection of classical data mining datasets from University of Edinburgh |
Government and business datasets
Data.gov | [HTML] | US Government data portal |
US Census Bureau | [HTML] | Demographic data on US population |
UK Government data | [HTML] | UK Government Data Portal |
Natural Language Processing Datasets
Microsoft NLP data | [HTML] | Corpora for natural language processing tasks released by Microsoft Research |
Stanford NLP data | [HTML] | Datasets and Software released by Stanford NLP research group |
Cornell NLP data | [HTML] | Collection of NLP corpora from Cornell University |