BSC Computer Science - Data Science Assignment 3 (SET A)

Rohit Bairwa January 06, 2022

Create own dataset and do simple preprocessing

Dataset Name: Data.CSV (save following data in Excel and save it with .CSV extension)

Country, Age, Salary, Purchased

France,44, 72000, No

Spain,27, 48000, Yes

Germany, 30, 54000, No

Spain,38, 61000, No

Germany, 40, Yes

France, 35, 58000, Yes

Spain,52000, No

France, 48, 79000, Yes

Germany, 50, 83000, No

France, 37, 67000, Yes

*Above dataset is also available at:

Click here for data set..

import numpy as np
import pandas as pd
df=pd.read_csv('Data.csv')
df

Output:

	Country	Age	Salary	Purchased
0	France	44.0	72000.0	No
1	Spain	27.0	48000.0	Yes
2	Germany	30.0	54000.0	No
3	Spain	38.0	61000.0	No
4	Germany	40.0	NaN	Yes
5	France	35.0	58000.0	Yes
6	Spain	NaN	52000.0	No
7	France	48.0	79000.0	Yes
8	Germany	50.0	83000.0	No
9	France	37.0	67000.0	Yes

Write a program in python to perform following task.

1. Import Dataset and do the followings:

a) Describing the dataset

b) Shape of the dataset

c) Display first 3 rows from dataset

a)Describing the dataset

df.describe()

Output:

Age	Salary
count	9.000000	9.000000
mean	38.777778	63777.777778
std	7.693793	12265.579662
min	27.000000	48000.000000
25%	35.000000	54000.000000
50%	38.000000	61000.000000
75%	44.000000	72000.000000
max	50.000000	83000.000000

b) Shape of the dataset

df.shape

Output:

(10, 4)

c) Display first 3 rows from dataset

df.head(3)

Output:

Country	Age	Salary	Purchased
0	France	44.0	72000.0	No
1	Spain	27.0	48000.0	Yes
2	Germany	30.0	54000.0	No

2. Handling Missing Value: a) Replace missing value of salary,age column with mean of that column.

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(df.iloc[:, 1:3])
df.iloc[:, 1:3] = imputer.transform(df.iloc[:, 1:3])  
df

Output:

Country	Age	Salary	Purchased
0	France	44.000000	72000.000000	No
1	Spain	27.000000	48000.000000	Yes
2	Germany	30.000000	54000.000000	No
3	Spain	38.000000	61000.000000	No
4	Germany	40.000000	63777.777778	Yes
5	France	35.000000	58000.000000	Yes
6	Spain	38.777778	52000.000000	No
7	France	48.000000	79000.000000	Yes
8	Germany	50.000000	83000.000000	No
9	France	37.000000	67000.000000	Yes

3. Data.csv have two categorical column (the country column, and the purchased column).

a. Apply OneHot coding on Country column.

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
df = pd.DataFrame(ct.fit_transform(df))
df

Output:

	0	1	2	3	4	5	6	7
0	1	0	1	0	0	44	72000	No
1	0	1	0	0	1	27	48000	Yes
2	0	1	0	1	0	30	54000	No
3	0	1	0	0	1	38	61000	No
4	0	1	0	1	0	40	63777.8	Yes
5	1	0	1	0	0	35	58000	Yes
6	0	1	0	0	1	38.7778	52000	No
7	1	0	1	0	0	48	79000	Yes
8	0	1	0	1	0	50	83000	No
9	1	0	1	0	0	37	67000	Yes

b. Apply Label encoding on purchased column

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df.iloc[:,-1] = le.fit_transform(df.iloc[:,-1])
df

Output:

0	1	2	3	4	5	6	7
0	1	0	1	0	0	44	72000	0
1	0	1	0	0	1	27	48000	1
2	0	1	0	1	0	30	54000	0
3	0	1	0	0	1	38	61000	0
4	0	1	0	1	0	40	63777.8	1
5	1	0	1	0	0	35	58000	1
6	0	1	0	0	1	38.7778	52000	0
7	1	0	1	0	0	48	79000	1
8	0	1	0	1	0	50	83000	0
9	1	0	1	0	0	37	67000	1

2 Comments

apedinteApril 23, 2022 at 8:18 AM
apedinte Kenji Yniguez https://marketplace.visualstudio.com/items?itemName=9ogprotuli.Descargar-Hanna-Must-Run-gratuita
oltpagenen
ReplyDelete
Replies
VdiatisWperf-nAugust 1, 2022 at 2:12 AM
VdiatisWperf-n Tonya Whiteside click here
click here
click here
click
payhamxygua
ReplyDelete
Replies

Add comment

Thanks,To visit this blog.

BSC Computer Science - Data Science Assignment 3 (SET A)

Create own dataset and do simple preprocessing

Write a program in python to perform following task.

2. Handling Missing Value: a) Replace missing value of salary,age column with mean of that column.

3. Data.csv have two categorical column (the country column, and the purchased column).

a. Apply OneHot coding on Country column.

b. Apply Label encoding on purchased column

Posted by Rohit Bairwa

Post a Comment

2 Comments

Search this blog

Total Page View

Contact Us (Send Questions)

Report Abuse

Categories

Most Popular

Tags

Menu Footer Widget

Contact form

BSC Computer Science - Data Science Assignment 3 (SET A)

Create own dataset and do simple preprocessing

Write a program in python to perform following task.

2. Handling Missing Value: a) Replace missing value of salary,age column with mean of that column.

3. Data.csv have two categorical column (the country column, and the purchased column).

a. Apply OneHot coding on Country column.

b. Apply Label encoding on purchased column

Posted by Rohit Bairwa

You may like these posts

Post a Comment

2 Comments

Search this blog

Total Page View

Contact Us (Send Questions)

Report Abuse

Categories

Most Popular

Tags

Menu Footer Widget

Contact form