Categories
Python

Important functionalities of Pandas in Python : Tricks and Features

Pandas is one of my favorite libraries in python. It’s very useful to visualize the data in a clean structural manner. Nowadays Pandas is widely used in Data Science, Machine Learning and other areas. The functionality of Pandas cannot be covered by any other libraries so far. In this post, I’ll cover some mainly used functionalities and tricks in Pandas.

To start working with Pandas, Import the package pandas. The standard way to import pandas is using the convention pd.

import pandas as pd

DataFrame in Pandas

A DataFrame is similar to a Table. Whereas, It’s represented in Rows and Columns with 2-dimensional data structure as seen below.

Pandas Data Representation

Now , Let’s create a 2-Dimensional Data Structure using Pandas.

import pandas as pd import numpy as np # Creating a dictionary with the keys and their values data = { 'Name': ['Bishul Haq', 'Jhon Green', 'Siva', 'Malik', 'Silva', 'Joseph', 'katie', 'Naruto', 'Seth', 'Penny'], 'Age': [25, 26, 27, 25, 24, 23, 22, 21, 20, 19], 'Country': ['Sri Lanka', 'United Kingdom', 'India', 'Jordan', 'Brazil', 'Pakistan', 'U.S.A', 'Japan', 'China', 'Italy'] } # DataFrame df = pd.DataFrame(data, columns=['Name','Age','Country'])

Using head and tail to view the DataFrame

To view the DataFrame object, You can use head() and tail() method. By default it comes with 5 elements to display, you can also pass a custom number to see the selected number of data.

df.head()

Output :

If you want to see the first three elements, then you can pass number three to see the data as shown below.

df.head(3)

Output :

To check the last three elements in the DataFrame. You can use the method tail() with the number three to see the data as,

df.tail(3)

Opening Data files

Pandas provides different methods to read variety of data in different formats,

# To read a CSV file pd.read_csv('name_of_the_file') # To read a Encoded CSV file pd.read_csv('name_of_the_file', encoding = 'ISO-8859-1') # To read a delimited text file (like TSV) pd.read_table('name_of_the_file') # To read an Excel file pd.read_excel('name_of_the_file') # To read a JSON file. pd.read_json('name_of_the_json_file') # To read an html URL, string or file to extract tables for a list of dataframes pd.read_html('url') # To read contents of your clipboard pd.read_clipboard()

Writing Data Files

You can export the DataFrame to different file formats such as CSV, SQL, JSON and .xlsx (Excel File).

#To Write the DataFrame into CSV file pd.to_csv('name_of_the_file_to_save') #To Write the DataFrame into Excel file df.to_excel('name_of_the_file_to_save') #To Write the DataFrame into SQL table df.to_sql('name_of_the_table', 'connection_object') #To Write the DataFrame into JSON file df.to_json('name_of_the_file_to_save')

Knowing some Useful Info of the DataFrame

There are some methods which are exclusively useful to check relevant information about the DataFrame.

# Rows and Columns in the DataFrame df.shape # Description of Index df.index # Columns in the DataFrame df.columns # Count and Unique values for columns df.apply(pd.Series.value_counts) # Data Counts which are not-null df.count() # Summary for numerical columns df.describe() # Mean Value of all columns df.mean() # Correlation between columns df.corr() # highest value in each column df.max() # lowest value in each column df.min() # median of each column df.median() # standard deviation of each column df.std()

Checking missing values in a DataFrame

To check the null values in python we use isnull(), notnull() functions to check null and not null values which will return as True or False.

import pandas as pd import numpy as np # Creating a dictionary with the keys and their values data = { 'Name': ['Bishul Haq', 'Jhon Green', np.nan, 'Malik', 'Silva', 'Joseph', 'katie', np.nan, 'Seth', 'Penny'], 'Age': [25, 26, 27, 25, 24, 23, 22, 21, 20, np.nan], 'Country': ['Sri Lanka', np.nan, 'India', 'Jordan', 'Brazil', 'Pakistan', 'U.S.A', 'Japan', np.nan, 'Italy'] } # DataFrame df = pd.DataFrame(data, columns=['Name','Age','Country'])
# using isnull() function to check the null values df.isnull()

Output :

# using notnull() function to check the values which are not null df.notnull()

Output :

Filling missing values in a DataFrame

Sometimes there may not be any value presented in the datasets. In Pandas missing data is represented in two ways.

  1. NaN : NaN (Not a Number),  It is a special floating-point value and cannot be converted to any other type than float.
  2. None: It represents the missing data in python code.

Hence, Pandas recognise None and NaN as missing or null values. We use several functions like fillna() and replace() to fill the null values in a DataFrame.

import pandas as pd import numpy as np # Creating a dictionary with the keys and their values data = { 'P_ID': [112,243,223,225,np.nan], 'Age': [25, 26, 27, np.nan,23], 'Weight': [56,33,44,55,np.nan] } # DataFrame df = pd.DataFrame(data, columns=['P_ID','Age','Weight'])
# filling missing value using fillna() df.fillna(0)

Output :

Some other common ways to fill the missing values using fillna() function.

# filing a missing value with previous values df.fillna(method ='pad') # filling null value using fillna() function df.fillna(method ='bfill')

Filling values with -99 value using replace() function.

# Replace Nan value with value -99 df.replace(to_replace = np.nan, value = -99)

Dropping missing values in a DataFrame

To delete the null values from the DataFrame. We use the dropna() function, which will drop the Columns or Rows with null values.

import pandas as pd import numpy as np # Creating a dictionary with the keys and their values data = { 'P_ID': [112,243,223,225,np.nan], 'Age': [25, 26, 27, np.nan,23], 'Weight': [56,33,44,55,np.nan] } # DataFrame df = pd.DataFrame(data, columns=['P_ID','Age','Weight'])
# using dropna() function to drop the null values df.dropna()

Output :

Some other functionalities of dropna() to drop the missing values.

# Drop a columns which have at least 1 missing values df.dropna(axis = 1) # Making new DataFrame with dropped missing values new_df = df.dropna(axis = 0, how ='any') # Drop rows where all data is missing df.dropna(how = 'all')
Droping Columns and Rows
# Removing rows by index value s.drop([0, 1]) #Remove columns Age df.drop('Age', axis = 1)

Data Selection in a DataFrame

We can select a subset or position or an index of Data in a DataFrame using several functions like iloc, loc.

# Select by passing the column label df[col] # Select by passing columns as a new DataFrame df[[col1, col2]] # Select by position df.iloc[0] # Select by Index s.loc['index_one'] # Select by first row df.iloc[0,:] # Select the first element of first column df.iloc[0,0]

Combining Data in a DataFrame

To combine multiple DataFrame into single DataFrame. We can use append() , merge() , join() and concat() functions.

  • Merge: To combine Data on common column or indices.
  • Join: to combine Data on Key column or index.
  • Concat: To combine DataFrames across rows or columns
# Add df1 to the end of df2 (columns should be identical) df1.append(df2) # Add the columns in df1 to the end of df2 (rows should be identical) pd.concat([df1, df2],axis=1) # Join the columns in df1 with the columns on df2 (rows should be identical) with 'left', 'right', 'outer', 'inner' join. df1.join(df2,on=col1,how='inner') # Merging df1 and df2 DataFrames merged_df = pd.merge(df1, df2)

Hope you have gone through the tricks and features of Pandas 😊. If you like this please share with others and drop your ideas and suggestions at the comment section.

Latest Posts
Make a Donation

Leave a Reply

Your email address will not be published. Required fields are marked *