Author: Seongju Hong
-
2020.04.26(pm): Gaussian Distribution
The Gaussian distribution is another name for the normal distribution. The Gaussian distribution, widely known in statistics, is a very important concept. Last time, I mentioned the concept of probability and statistics and mentioned the central limit theorem. Let’s look at the central limit theorem again. Central limit theorem The sample data sampled from a…
-
2020.04.12(pm): Regression Example: Housing price prediction
This is a summary of Deep Learning Chapter4 from the founder of Keras. This time, as an example of a regression problem, we will try to predict the housing price. The data we will use today is the Boston Housing Price Dataset, which estimates the median value of housing prices given data such as crime…
-
2020.04.05(pm): Binary Classification – Movie Review Classification
We have a lot to do with binary classification in everyday life. For example, there are dogs and cats, 100 won coins and 500 won coins, and iPhone and Samsung Galaxy phones. This time, I’m going to classify a movie review. Binary classification is considered to be the most widely used in machine learning. Let’s…
-
2020.03.29(pm): Statistical Inference and Hypothesis Testing
This time, let’s look at the concepts of probability and statistics that are the basis of machine learning algorithms. When learning a model in the area of supervised learning, the most important thing is variable selection. Numerical interpretation and verification are required to ensure good selection of this variable. So, what is needed is the…
-
2020.03.21(pm): Support Vector Machine(SVM)
The SVM covered in this post is a supervised learning algorithm for solving classification problems. SVM extends the input data to create more complex models that are not defined as simple hyperplanes. SVM can be applied to both classification and regression. Linear models and nonlinear characteristics Because linear and hyperplanes are not flexible, linear models…
-
2020.03.15(pm): DBSCAN
Last time, we even learned a hierarchical clustering algorithm. This time, let’s look at an algorithm called DBSCAN (Density-based spatial clustering of applications with noise). DBSCAN is a very useful clustering algorithm, and its main advantage is that you do not need to specify the number of clusters in advance. This algorithm can also find…
-
2020.03.14(pm): Hierarchical Clustering
We learned about K-means clustering last time. K-means clustering has a limitation that it can be used only when the density of the cluster is constant and the shape of the cluster is simplified. You also have to specify the number of clusters you want to find. Let’s take a look at the clustering algorithm…
-
2020.03.08(am): K-means Clustering
Unsupervised learning There are two types of machine learning algorithms: supervised learning and unsupervised learning. Unsupervised learning refers to all kinds of machine learning that must teach a learning algorithm without any known output or information. The most difficult thing in unsupervised learning is to evaluate whether the algorithm has learned something useful. Unsupervised learning…
-
2020.03.01(pm): Decision Tree
Decision trees are a popular model for classification and regression problems. Basically, decision trees learn from yes to no questions in order to reach a decision. In other words, it is a tree representing a prediction / classification model that represents a pattern inherent in the data as a combination of variables. Narrow down the…
-
2020.01.18(pm): Feature Selection and Dimension reduction
Feature Engineering Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. Feature engineering is fundamental to the application of machine learning, and is both difficult and expensive. The need for manual feature engineering can be obviated by automated feature learning(from : https://en.wikipedia.org/wiki/Feature_engineering )…