"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"pca=PCA()\n",
"pca.fit(X)\n",
"v=pca.explained_variance_\n",
"print(v)\n",
"plt.plot(np.arange(1,11), np.cumsum(v));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Exercise 8 (explained variance)
\n",
"\n",
"This exercise can give two points at maximum!\n",
"\n",
"Part 1.\n",
"\n",
"Write function `explained_variance` which reads the tab separated file \"data.tsv\". The data contains 10 features. Then fit PCA to the data. The function should return two lists (or 1D arrays). The first list should contain the variances of all the features. The second list should consist of the explained variances returned by the PCA.\n",
"\n",
"In the main function print these values in the following form:\n",
"```\n",
"The variances are: ?.??? ?.??? ...\n",
"The explained variances after PCA are: ?.??? ?.??? ...\n",
"```\n",
"Print the values with three decimal precision and separate the values by a space.\n",
"\n",
"Part 2.\n",
"\n",
"Plot the cumulative explained variances. The y-axis should be the cumulative sum, and the x-axis the number of terms in the cumulative sum.\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary (week 6)\n",
"\n",
"* We got to know another supervised learning method, namely, naive Bayes classification\n",
"* We saw examples of naive Bayes classification where either Gaussian or multinomial distribution was used to model the features of samples belonging to a class\n",
"* We saw how to use cross validation to asses prediction abilities of a model. This allows us to be sure that the model is not overfitting.\n",
"* In the clustering section we saw examples of using k-means, DBSCAN, and hierarchical clustering methods. They have different approaches to clustering, and each have different strengths.\n",
"* Clustering is based on the notion of distance between the points in the data.\n",
"* Principal component analysis is another example of unsupervised learning\n",
"* It can reduce the dimensionality of a data by throwing away those dimensions where the variability is low."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}