• Calculating...
  • 4 years ago
  • 5.1K Views

Important functionalities of Pandas in Python : Tricks and Features

Pandas is one of my favorite libraries in python. It’s very useful to visualize the data in a clean structural manner. Nowadays Pandas is widely used in Data Science, Machine Learning and other areas. The functionality of Pandas cannot be covered by any other libraries so far. In this post, I’ll cover some mainly used functionalities and tricks in Pandas..

 

To start working with Pandas, Import the package pandas. The standard way to import pandas is using the convention pd.

import pandas as pd
 

DataFrame in Pandas

A DataFrame is similar to a Table. Whereas, It’s represented in Rows and Columns with 2-dimensional data structure as seen below.

DataFrame

 

Now , Let’s create a 2-Dimensional Data Structure using Pandas.

import pandas as pd 
import numpy as np

# Creating a dictionary with the keys and their values
data = {
'Name': ['Bishul Haq', 'Jhon Green', 'Siva', 'Malik', 'Silva', 'Joseph', 'katie', 'Naruto', 'Seth', 'Penny'],
'Age': [25, 26, 27, 25, 24, 23, 22, 21, 20, 19],
'Country': ['Sri Lanka', 'United Kingdom', 'India', 'Jordan', 'Brazil', 'Pakistan', 'U.S.A', 'Japan', 'China', 'Italy']
}

# DataFrame
df = pd.DataFrame(data, columns=['Name','Age','Country'])
 

Using head and tail to view the DataFrame

To view the DataFrame object, You can use head() and tail() method. By default it comes with 5 elements to display, you can also pass a custom number to see the selected number of data.

df.head()

Output :

If you want to see the first three elements, then you can pass number three to see the data as shown below.

df.head(3)
 

Output :

To check the last three elements in the DataFrame. You can use the method tail() with the number three to see the data as,

df.tail(3)
 

Opening Data files

Pandas provides different methods to read variety of data in different formats,

# To read a CSV file
pd.read_csv('name_of_the_file')

# To read a Encoded CSV file
pd.read_csv('name_of_the_file', encoding = 'ISO-8859-1')

# To read a delimited text file (like TSV)
pd.read_table('name_of_the_file')

# To read an Excel file
pd.read_excel('name_of_the_file')

# To read a JSON file.
pd.read_json('name_of_the_json_file')

# To read an html URL, string or file to extract tables for a list of dataframes
pd.read_html('url')

# To read contents of your clipboard
pd.read_clipboard()

Writing Data Files

You can export the DataFrame to different file formats such as CSV, SQL, JSON and .xlsx (Excel File).

#To Write the DataFrame into CSV file
pd.to_csv('name_of_the_file_to_save')

#To Write the DataFrame into Excel file
df.to_excel('name_of_the_file_to_save')

#To Write the DataFrame into SQL table
df.to_sql('name_of_the_table', 'connection_object')

#To Write the DataFrame into JSON file
df.to_json('name_of_the_file_to_save')

Knowing some Useful Info of the DataFrame

There are some methods which are exclusively useful to check relevant information about the DataFrame.

# Rows and Columns in the DataFrame
df.shape

# Description of Index
df.index

# Columns in the DataFrame
df.columns

# Count and Unique values for columns
df.apply(pd.Series.value_counts)

# Data Counts which are not-null
df.count()

# Summary for numerical columns
df.describe()

# Mean Value of all columns 
df.mean()

# Correlation between columns
df.corr()

# highest value in each column
df.max()

# lowest value in each column 
df.min()

# median of each column
df.median()

# standard deviation of each column
df.std()
 

Checking missing values in a DataFrame

To check the null values in python we use isnull()notnull() functions to check null and not null values which will return as True or False.

import pandas as pd 
import numpy as np
  
# Creating a dictionary with the keys and their values
data = {
'Name': ['Bishul Haq', 'Jhon Green', np.nan, 'Malik', 'Silva', 'Joseph', 'katie', np.nan, 'Seth', 'Penny'],
'Age': [25, 26, 27, 25, 24, 23, 22, 21, 20, np.nan],
'Country': ['Sri Lanka', np.nan, 'India', 'Jordan', 'Brazil', 'Pakistan', 'U.S.A', 'Japan', np.nan, 'Italy']
}
  
# DataFrame
df = pd.DataFrame(data, columns=['Name','Age','Country'])
 
# using isnull() function to check the null values  
df.isnull()  

Output :

# using notnull() function to check the values which are not null   
df.notnull()

Output :

Filling missing values in a DataFrame

Sometimes there may not be any value presented in the datasets. In Pandas missing data is represented in two ways.

  1. NaN : NaN (Not a Number),  It is a special floating-point value and cannot be converted to any other type than float.
  2. None: It represents the missing data in python code.

Hence, Pandas recognise None and NaN as missing or null values. We use several functions like fillna() and replace() to fill the null values in a DataFrame.

 
import pandas as pd 
import numpy as np
  
# Creating a dictionary with the keys and their values
data = {
'P_ID': [112,243,223,225,np.nan],
'Age': [25, 26, 27, np.nan,23],
'Weight': [56,33,44,55,np.nan]
}
  
# DataFrame
df = pd.DataFrame(data, columns=['P_ID','Age','Weight'])
 
# filling missing value using fillna()   
df.fillna(0)
 

Output :

Some other common ways to fill the missing values using fillna() function.

# filing a missing value with previous values   
df.fillna(method ='pad')

# filling  null value using fillna() function   
df.fillna(method ='bfill')   

Filling values with -99 value using replace() function.

 
# Replace  Nan value with value -99   
df.replace(to_replace = np.nan, value = -99)
 
Output :

Dropping missing values in a DataFrame

To delete the null values from the DataFrame. We use the dropna() function, which will drop the Columns or Rows with null values.

 
import pandas as pd 
import numpy as np
  
# Creating a dictionary with the keys and their values
data = {
'P_ID': [112,243,223,225,np.nan],
'Age': [25, 26, 27, np.nan,23],
'Weight': [56,33,44,55,np.nan]
}
  
# DataFrame
df = pd.DataFrame(data, columns=['P_ID','Age','Weight'])
 
# using dropna() function to drop the null values   
df.dropna()  
 
 

Output :

Some other functionalities of dropna() to drop the missing values.

 
# Drop a columns which have at least 1 missing values      
df.dropna(axis = 1) 

# Making new DataFrame with dropped missing values  
new_df = df.dropna(axis = 0, how ='any')

# Drop rows where all data is missing
df.dropna(how = 'all')

Droping Columns and Rows

# Removing rows by index value
s.drop([0, 1])

#Remove columns Age
df.drop('Age', axis = 1)
 

Data Selection in a DataFrame

We can select a subset or position or an index of Data in a DataFrame using several functions like ilocloc.

 
# Select by passing the column label
df[col]

# Select by passing columns as a new DataFrame
df[[col1, col2]]

# Select by position
df.iloc[0]

# Select by Index
s.loc['index_one']

# Select by first row
df.iloc[0,:]

# Select the first element of first column
df.iloc[0,0]

Combining Data in a DataFrame

To combine multiple DataFrame into single DataFrame. We can use append() , merge() , join() and concat() functions.

  • Merge: To combine Data on common column or indices.
  • Join: to combine Data on Key column or index.
  • Concat: To combine DataFrames across rows or columns
# Add df1 to the end of df2 (columns should be identical)
df1.append(df2)

# Add the columns in df1 to the end of df2 (rows should be identical)
pd.concat([df1, df2],axis=1)

# Join the columns in df1 with the columns on df2 (rows should be identical) with 'left', 'right', 'outer', 'inner' join.
df1.join(df2,on=col1,how='inner')

# Merging df1 and df2 DataFrames
merged_df = pd.merge(df1, df2)

Hope you have gone through the tricks and features of Pandas 😊. If you like this please share with others and drop your ideas and suggestions at the comment section.

Share:

Related Post

How to get data from twitter using Tweepy in Python?

To start working on Python you need to have Python installed on your PC. If you haven’t installed python. Go to the Python website and get it installed.

  • 4 years ago

Predicting per capita income of the US using linear regression

Python enables us to predict and analyze any given data using Linear regression. Linear Regression is one of the basic machine learning or statistical techniques created to solve complex problems.

  • 4 years ago

Essential Sorting Algorithms for Computer Science Students

Algorithms are commonly taught in Computer Science, Software Engineering subjects at your Bachelors or Masters. Some find it difficult to understand due to memorizing.

  • 4 years ago