9 free Machine learning datasets
A collection of various datasets for machine learning. UCI Machine Learning RepositoryUC Irvine Machine Learning Repository, currently maintains 335 data. You may view all data sets through their searchable interface. Their old web site is still available, for those who prefer the old format. Landsat on AWSLandsat 8 data is available for anyone to use via Amazon S3. All Landsat 8 scenes from 2015 are available along with a selection of cloud-free scenes from 2013 and 2014. All new Landsat 8 scenes are made available each day, often within hours of production. MathWorks has created a freely-downloadable tool for accessing, processing, and visualizing Landsat on AWS data in MATLAB. With this tool, you can create a map display of scene locations with markers that show each scene’s metadata. Modeling Online AuctionsModeling online auctions provides data sets from eBay. All files are available in comma separated format (CSV) and with data field discription. Million Song DatasetThe Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Delve DatasetsThe Delve datasets and families are available from this page. Every dataset (or family) has a brief overview page and many also have detailed documentation. You can download gzipped-tar files of the datasets. Datasets are categorized as primarily assessment, development or historical according to their recommended use. KEEL-datasetIn KEEL-dataset repository aims at providing to the machine learning researchers a set of benchmarks to analyze the behavior of the learning methods. Concretely, it is possible to find benchmarks already formatted in KEEL format for classification (such as standard, multi instance or imbalanced data), semi-supervised classification, regression, time series and unsupervised learning. Also, a set of low quality data benchmarks is maintained in the repository. 1000 Genomes Project and AWSThe 1000 Genomes Project is an international collaboration which has established the most detailed catalogue of human genetic variation, including SNPs, structural variants, and their haplotype context. The final phase of the project sequenced more than 2500 individuals from 26 different populations around the world and produced an integrated set of phased haplotypes with more than 80 million variants for these individuals. The Amazon mirror contains the complete data set from the project and the data can be found at: s3.amazonaws.com/1000genomes. Mammographic Image AnalysisLinks to various datasets of Mammographic Image Analysis Society (MIAS) database and the Digital Database for Screening Mammography (DDSM) etc. Auton Lab DatasetsVarious datasets like Alias Detection Datasets, Link Datasets, Logistic Regression Datasets, Optimal Reinsertion Datasets etc.