Cover image for Predicting per capita income of the US using linear regression

At a glance

Reading time

~200 words/min

Published

6 years ago

Mar 11, 2020

Views

6K

All-time total

Predicting per capita income of the US using linear regression

Python enables us to predict and analyze any given data using Linear regression. Linear Regression is one of the basic machine learning or statistical techniques created to solve complex problems.

 

In Machine Learning or in Data Science regression is known to be one of the most crucial fields and there’re many regression methods available today. Linear Regression is one of them. whereas, regression is used to find the relationship among the variables. Using the current data along with the income and year, we can predict the future income of any year using linear regression. I’ll be using the scikit-learn library to implement linear regression.

 

Import the relevant packages

Import package numpy , matplotlib for charts, pandas for reading CSV files and the class LinearRegression from sklearn.linear_model to implement linear regression.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model
 

Read the CSV file

You can download Canada Per Capita Income from World Bank or from other data sources. I have gathered the data from the World Bank and it’s included in the CSV file below.

 

Read the above CSV file from read_csv via pandas.

 
csv = pd.read_csv("gdp-per-capita-us.csv")
 

Display using Scatter Chart

Display the CSV file in a scatter chart with the help of MatplotLib Library.

plt.scatter(csv.year, csv.income, marker="*",color="green")
plt.plot(csv.year,csv.income, color="yellow")
plt.xlabel("Year")
plt.ylabel("Income")
plt.show()

 

Run Linear Regression

Create the linear regression model and predict the income of the year 2020.

l_r = linear_model.LinearRegression()
l_r.fit(csv[['year']],csv.income)
l_r.predict([[2020]])

You can also use the equation of the straight line to predict the income of 2020 manually as shown below,

l_r.coef_
l_r.intercept_
y= l_r.coef_*2020 + l_r.intercept_
print("Predicted Income of 2020 is %d" %y)

So the final program will look like this,

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model

csv = pd.read_csv("gdp-per-capita-us.csv")

plt.scatter(csv.year, csv.income, marker="*",color="green")
plt.plot(csv.year,csv.income, color="yellow")
plt.xlabel("Year")
plt.ylabel("Income")
plt.show()

l_r = linear_model.LinearRegression()
l_r.fit(csv[['year']],csv.income)
l_r.predict([[2020]])

# To predict Manually using the equation of the straight line

l_r.coef_
l_r.intercept_
y= l_r.coef_*2020 + l_r.intercept_
print("Predicted Income of 2020 is %d" %y)
Learn Python Android app icon

Practice on the go

Learn Python, the free Android app

Every topic in this series lives in the app too: bite-size lessons, runnable examples, quizzes, mini projects, and an offline Python playground that runs on your phone.

Newsletter

Want more posts like this?

Get practical software notes and tutorials delivered when something new is published.

No spam. Unsubscribe anytime.

How did this land?

Comments

0
Log in or sign up to join the discussion and react to this post.

No comments yet. Be the first to share your thoughts.

Related posts

Important functionalities of Pandas in Python : Tricks and Features

Pandas is one of my favorite libraries in python. It’s very useful to visualize the data in a clean structural manner. Nowadays Pandas is widely used in Data Science, Machine Learning and other areas.

5 years ago

How to get data from twitter using Tweepy in Python?

To start working on Python you need to have Python installed on your PC. If you haven’t installed python. Go to the Python website and get it installed.

6 years ago

Essential Sorting Algorithms for Computer Science Students

Algorithms are commonly taught in Computer Science, Software Engineering subjects at your Bachelors or Masters. Some find it difficult to understand due to memorizing.

6 years ago

Modern Python 3.14 Setup for LLM Projects: uv, Virtualenvs, Typing, and Project Layout

Set up a fast, reproducible Python 3.14 project with uv, a src layout, Ruff, and mypy: the foundation for the FastAPI and LLM work ahead.

2 weeks ago

Type-Safe Data Modeling with Pydantic v2 and Python Type Hints

Use Pydantic v2 to validate data at the boundary: models, field constraints, custom validators, and clean serialization, the backbone of FastAPI and reliable LLM output.

2 weeks ago