Machine Learning: Simple Linear Regression With Python

Buyung Hardiansyah
3 min readMay 12, 2022

Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values.

There are 3 popular type of machine learning: Supervised Learning, Unsupervised Learning and Reinforcement Learning.

On this occasion, we will learn about Supervised Learning with Simple Linear Regression Algorithm with Python Programming Language.

Linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables.

Simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

what we will learn in this tutorial:
1. Load Python Library
2. Load the Dataset
3. Create Scatter Plot
4. Modelling
5. Prediction

  1. Load Python Libraries

Load python libraries that we will use for this tutorial

  • We will use the LinearRegression Module for the Linear Regression Algorithm.
  • The train_test_split module is used to split our data into training and testing sets.

2. Load the Dataset

in this tutorial we will use dataset from this URL: https://www.kaggle.com/datasets/carrie1/ecommerce-data .
lets import the dataset using pandas library

3. Create Scatter Plot

Next we will create a scatter plot.

From the scatter plot, we can see that the data has a significant positive correlation.
This means that with the increase in the value of length of Membership, the value of the Yearly Amount Spent will also increase.

4. Modelling

Now we have ideas about the details of data statistics. The next step is to create the modelling.

first of all, we have to divide the data into “attributes” and “target labels”. Attributes are independent variables, and target labels are dependent variables whose values ​​are to be predicted. In our dataset we only have two columns. We want to predict the Yearly Amount Spent upon the length of membership.

Now that we have our attributes and labels, the next step is to split this data into training and test sets. We’ll do this by using Scikit-Learn built-in train_test_split()

The script splits 80% of the data to training set while 20% of the data to test set. The test_size variable is where we actually specify the proportion of test set.

final step is to train the model by calling the fit method.
lin_reg.fit(x_train, y_train)

5. Predictions

Now that we have trained our algorithm, it’s time to make some predictions. To do so, we will use our test data and see how accurately our algorithm predicts the percentage score.

The red line is the Regression Line from the previously created model.

now lets predict yearly amount spend when the length of membership is 2 years.

Yearly amount spent with the length of membership for 2 years

Well done! We have finished the Machine Learning tutorial using the Simple Linear Regression algorithm. I hope that we can learn the how to use the simple linear regression techniques. You can also find the full project on the Github repository.

Thank you for reading this article. clap this post if you enjoy it.

--

--

Buyung Hardiansyah

Software Developer -- Golang | Python | PHP | Reactjs | Flutter