Create own dataset and do simple preprocessing
Dataset Name: Data.CSV (save following data in Excel and save it with .CSV extension)
Country, Age, Salary, Purchased
France,44, 72000, No
Spain,27, 48000, Yes
Germany, 30, 54000, No
Spain,38, 61000, No
Germany, 40, Yes
France, 35, 58000, Yes
Spain,52000, No
France, 48, 79000, Yes
Germany, 50, 83000, No
France, 37, 67000, Yes
*Above dataset is also available at:
Click here for data set..
import numpy as np import pandas as pd df=pd.read_csv('Data.csv') df
Output:
Country | Age | Salary | Purchased | |
---|---|---|---|---|
0 | France | 44.0 | 72000.0 | No |
1 | Spain | 27.0 | 48000.0 | Yes |
2 | Germany | 30.0 | 54000.0 | No |
3 | Spain | 38.0 | 61000.0 | No |
4 | Germany | 40.0 | NaN | Yes |
5 | France | 35.0 | 58000.0 | Yes |
6 | Spain | NaN | 52000.0 | No |
7 | France | 48.0 | 79000.0 | Yes |
8 | Germany | 50.0 | 83000.0 | No |
9 | France | 37.0 | 67000.0 | Yes |
Write a program in python to perform following task.
1. Import Dataset and do the followings:
a) Describing the dataset
b) Shape of the dataset
c) Display first 3 rows from dataset
a)Describing the dataset
df.describe()
Output:
Age | Salary | |
---|---|---|
count | 9.000000 | 9.000000 |
mean | 38.777778 | 63777.777778 |
std | 7.693793 | 12265.579662 |
min | 27.000000 | 48000.000000 |
25% | 35.000000 | 54000.000000 |
50% | 38.000000 | 61000.000000 |
75% | 44.000000 | 72000.000000 |
max | 50.000000 | 83000.000000 |
b) Shape of the dataset
df.shape
Output:
(10, 4)
c) Display first 3 rows from dataset
df.head(3)
Output:
2. Handling Missing Value: a) Replace missing value of salary,age column with mean of that column.
from sklearn.impute import SimpleImputer imputer = SimpleImputer(missing_values=np.nan, strategy='mean') imputer.fit(df.iloc[:, 1:3]) df.iloc[:, 1:3] = imputer.transform(df.iloc[:, 1:3]) df
Output:
Country | Age | Salary | Purchased | |
---|---|---|---|---|
0 | France | 44.000000 | 72000.000000 | No |
1 | Spain | 27.000000 | 48000.000000 | Yes |
2 | Germany | 30.000000 | 54000.000000 | No |
3 | Spain | 38.000000 | 61000.000000 | No |
4 | Germany | 40.000000 | 63777.777778 | Yes |
5 | France | 35.000000 | 58000.000000 | Yes |
6 | Spain | 38.777778 | 52000.000000 | No |
7 | France | 48.000000 | 79000.000000 | Yes |
8 | Germany | 50.000000 | 83000.000000 | No |
9 | France | 37.000000 | 67000.000000 | Yes |
3. Data.csv have two categorical column (the country column, and the purchased column).
a. Apply OneHot coding on Country column.
from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough') df = pd.DataFrame(ct.fit_transform(df)) df
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0 | 0 | 44 | 72000 | No |
1 | 0 | 1 | 0 | 0 | 1 | 27 | 48000 | Yes |
2 | 0 | 1 | 0 | 1 | 0 | 30 | 54000 | No |
3 | 0 | 1 | 0 | 0 | 1 | 38 | 61000 | No |
4 | 0 | 1 | 0 | 1 | 0 | 40 | 63777.8 | Yes |
5 | 1 | 0 | 1 | 0 | 0 | 35 | 58000 | Yes |
6 | 0 | 1 | 0 | 0 | 1 | 38.7778 | 52000 | No |
7 | 1 | 0 | 1 | 0 | 0 | 48 | 79000 | Yes |
8 | 0 | 1 | 0 | 1 | 0 | 50 | 83000 | No |
9 | 1 | 0 | 1 | 0 | 0 | 37 | 67000 | Yes |
b. Apply Label encoding on purchased column
from sklearn.preprocessing import LabelEncoder le = LabelEncoder() df.iloc[:,-1] = le.fit_transform(df.iloc[:,-1]) df
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0 | 0 | 44 | 72000 | 0 |
1 | 0 | 1 | 0 | 0 | 1 | 27 | 48000 | 1 |
2 | 0 | 1 | 0 | 1 | 0 | 30 | 54000 | 0 |
3 | 0 | 1 | 0 | 0 | 1 | 38 | 61000 | 0 |
4 | 0 | 1 | 0 | 1 | 0 | 40 | 63777.8 | 1 |
5 | 1 | 0 | 1 | 0 | 0 | 35 | 58000 | 1 |
6 | 0 | 1 | 0 | 0 | 1 | 38.7778 | 52000 | 0 |
7 | 1 | 0 | 1 | 0 | 0 | 48 | 79000 | 1 |
8 | 0 | 1 | 0 | 1 | 0 | 50 | 83000 | 0 |
9 | 1 | 0 | 1 | 0 | 0 | 37 | 67000 | 1 |
2 Comments
apedinte Kenji Yniguez https://marketplace.visualstudio.com/items?itemName=9ogprotuli.Descargar-Hanna-Must-Run-gratuita
ReplyDeleteoltpagenen
VdiatisWperf-n Tonya Whiteside click here
ReplyDeleteclick here
click here
click
payhamxygua
Thanks,To visit this blog.