# Feature Scaling In Machine Learning. Before applying any machine learning algorithm, We first need to pre-process our data-set. Pre-processing includes cleaning, dealing with missing values and analyzing the data-set then doing feature scaling. ## So, What is Feature Scaling? Feature scaling is nothing but a method through which we scale our data-set for proper mathematical calculations. Let's understand it with an example, Suppose we have a data-set say, ```python import pandas as pd data = pd.read_csv(r"LR_House_price.csv") data.head(5) ```

square_feet | price | |
---|---|---|

0 | 150 | 6450 |

1 | 200 | 7450 |

2 | 250 | 8450 |

3 | 300 | 9450 |

4 | 350 | 11450 |

Observe the values of feature "square_feet" and the values of "price", The values in price is very large than the values of square_feet. So, if were to do some mathematical calculations or say I want to run gradient descent method to find the best-fit line, I have to do complex and large calculations because of these highly scaled values in each feature. To avoid such time-consuming calculations we scale our data-set in such a way that each feature has nearly the same values in relation to the corresponding features.

# Python Library For Feature Scaling Python provides us a library that contains such data-preprocessing tools and we have to just import them to make our life easy. So, let's import the sklearn.preprocessing library. Don't worry I will explain all the types of feature scaling, But first I wanna show you how this data-set will look after we apply one of the scaling methods known as StandardScaler. ```python import sklearn.preprocessing as pp ss = pp.StandardScaler() #load StandardScaler() scaledData = ss.fit_transform(data) #fit_transform() takes data-set and scales it. scaledData ``` array([[-1.2377055 , -1.11477282], [-0.87670806, -0.87091627], [-0.51571062, -0.62705971], [-0.15471319, -0.38320316], [ 0.20628425, 0.10450995], [ 0.56728169, 1.07993617], [ 2.01127143, 1.81150584]]) Whenever you scale data, It will return you a numpy array. So you need to again convert it into a DataFrame ```python scaledData = pd.DataFrame(scaledData, columns=["square_feet","price"]) scaledData ```

square_feet | price | |
---|---|---|

0 | -1.237705 | -1.114773 |

1 | -0.876708 | -0.870916 |

2 | -0.515711 | -0.627060 |

3 | -0.154713 | -0.383203 |

4 | 0.206284 | 0.104510 |

5 | 0.567282 | 1.079936 |

6 | 2.011271 | 1.811506 |

Observe the values of each feature, They are now scaled with respect to each other. You can see by a very large factor the values got reduced. This will help us in the faster and easy calculations when we apply a machine learning algorithm. Let's check if scaling has changed the properties of the data-set or not. # Before Scaling ```python import matplotlib.pyplot as plt plt.scatter(data["square_feet"],data["price"]) ```

# After Scaling ```python plt.scatter(scaledData["square_feet"],scaledData["price"]) ```

You can observe that the trend remains the same. # Types Of Feature Scaling ## 1. Standard Scaler Standard Scaler scales your data-set in a way that distribution of your data-set will be centered around 0 with a standard deviation of 1. The Formula for standard Scaling is xi - mean(xi) / stdev(x) let's check if it is true or not! ## mean before scaling ```python data.mean() ``` square_feet 321.428571 price 11021.428571 dtype: float64 ## mean after scaling ```python scaledData.mean() ``` square_feet -1.903239e-16 price 1.586033e-16 dtype: float64 So, You can observe that the mean shifts to 0 after StanderedScaling # stdev before scaling ```python data.std() ``` square_feet 149.602648 price 4429.339411 dtype: float64 ## stdev after scaling ```python scaledData.std() ``` square_feet 1.080123 price 1.080123 dtype: float64 See after StanderdScaling the deviation comes to 1. ### One point to note that standard scaler method is only good when the data-set has a normal distribution. For skewed distribution, we can go for other scaling methods..

## 2. MinMax Scaler Minmax scaler is the method in which the dataset revolves around 0 and 1 or -1 to 1 if there are any negative values. This scaler works better for cases in which the standard scaler might not work so well. If the distribution is not Gaussian or the standard deviation is very small, the min-max scaler works better. The Formula for minmax scaler is--> xi - min(x) / max(x) - min(x) This method is not recommended if your dataset has outliers. Let's observe min-max in action. First, I will show you how our dataset looks in kdeplot. Just observe the figure if you don't know what kdeplot is. ```python import seaborn as sea ``` ```python sea.kdeplot(data['square_feet']) sea.kdeplot(data['price']) ```

Now, Let's apply MinMax Scaler. ```python minmax = pp.MinMaxScaler() mmScale = minmax.fit_transform(data) ``` ```python mmScale ``` array([[0. , 0. ], [0.11111111, 0.08333333], [0.22222222, 0.16666667], [0.33333333, 0.25 ], [0.44444444, 0.41666667], [0.55555556, 0.75 ], [1. , 1. ]]) Observe, In Minmax Scaling the data values or points are shifted around 0 and 1. Let's Observe how it looks in kdeplot. ```python sea.kdeplot(mmScale[:,0]) sea.kdeplot(mmScale[:,1]) ```

One thing more you should observe that the starting and ending values of each feature are almost the same. Let's Observe if it changes the trend of our dataset or not!! ```python plt.scatter(mmScale[:,0], mmScale[:,1]) ```

Cool, Our data-set still shows the same trend. # 3. Robust Scaler Robust scaling is used when the data-set has some outliers. It is almost similar to min-max scaler but unlike the min0max scaler, It uses inter-quartile range or say, the range of our data-set in its formula. xi - Q1(x) / Q3(x)-Q2(x) To demonstrate Robust Scaler I will create a dummy dataset using numpy ```python import numpy as np dummyData = pd.DataFrame({ # Distribution with lower outliers 'x1': np.concatenate([np.random.normal(10, 1, 1000), np.random.normal(1, 1, 25)]), # Distribution with higher outliers 'x2': np.concatenate([np.random.normal(15, 1, 1000), np.random.normal(50, 1, 25)]), }) ``` ```python sea.kdeplot(dummyData["x1"]) sea.kdeplot(dummyData["x2"]) ```

```python rs = pp.RobustScaler() rscaledData = rs.fit_transform(dummyData) rscaledData ``` array([[-0.28695671, -0.43497726], [-1.57556592, -0.98123282], [ 0.55617446, -0.03027722], ..., [-6.08243683, 24.73749711], [-5.4203302 , 25.81466066], [-5.94025948, 26.08934034]]) ```python sea.kdeplot(rscaledData[:,0]) sea.kdeplot(rscaledData[:,1]) ```

# 4. Normalizer Now, This scaling is very interesting and drives you through linear algebra. In this scaling, data-point are scaled in such a way that they will be at a distance of 1 unit from the origin. Let's understand it by an example. I will again use numpy to create a dummyset ```python data = pd.DataFrame({ 'x1': np.random.randint(-100, 100, 1000).astype(float), 'y1': np.random.randint(-80, 80, 1000).astype(float), 'z1': np.random.randint(-150, 150, 1000).astype(float), }) ``` ```python data ```

x1 | y1 | z1 | |
---|---|---|---|

0 | 92.0 | 0.0 | 123.0 |

1 | 76.0 | 44.0 | 132.0 |

2 | -37.0 | 76.0 | -148.0 |

3 | -1.0 | -77.0 | 131.0 |

4 | -29.0 | 28.0 | 6.0 |

... | ... | ... | ... |

995 | -6.0 | -7.0 | 127.0 |

996 | 74.0 | -50.0 | 21.0 |

997 | -94.0 | -24.0 | 6.0 |

998 | -31.0 | -1.0 | -86.0 |

999 | 62.0 | 7.0 | 69.0 |

Let's first talk about the linear algebra side. You can observe that our data-set has 3 features. Now, we consider these 3 features namely x1, x2, x3 as three different dimensions then, we can plot a point say (-39.0, -11.0, -105.0) in a coordinate system. ```python from mpl_toolkits.mplot3d import Axes3D fig = plt.figure() ax = fig.add_subplot(111, projection='3d') ax.scatter(data['x1'],data['y1'],data['z1']) ```

Now, Let's scale our data-set using normalizer method ```python normalizer = pp.Normalizer() scaledData = normalizer.fit_transform(data) scaledData ``` array([[ 0.59895783, 0. , 0.80078057], [ 0.4793641 , 0.27752659, 0.83257976], [-0.21708816, 0.44591081, -0.86835263], ..., [-0.96707 , -0.24691149, 0.06172787], [-0.33908651, -0.01093827, -0.9406916 ], [ 0.66647405, 0.07524707, 0.74172112]]) Now, let's plot our data-set ```python fig = plt.figure() ax = fig.add_subplot(111, projection='3d') ax.scatter(scaledData[:,0],scaledData[:,1],scaledData[:,2]) ```

Strange! What happens here is every point in our data-set is now at unit 1 distance from the axis. Due to this, The shape of the data-set becomes a sphere that indicates that every point is at a unit and equal distance. The normalizer scales each value by dividing each value by its magnitude in n-dimensional space for n number of features. I hope that by now you know everything about feature scaling. ```python ```

## Social Plugin