Simple Linear Regression – Knime

1 371 3 minutes read

Linear regression is a kind of statistical analysis that attempts to show a relationship between two variables. Linear regression looks at various data points and plots a trend line. Linear regression can create a predictive model on apparently random data, showing trends in data set, such as in cancer diagnoses or in stock prices.

Predicting a response using a single features. Given a set of data points inputs (X) and responses (Y). Simple linear regression tries to fit a line that passes through maximum number of points while minimizing the squared distance of the points to the fitted line values.

In simple terms, Linear Regression helps to find the relation between two things and Linear regression is a type of supervised algorithm.

The regression equation is of the form,

y=b0+ b1x+e

y = Dependent variable

x = Independent variable

The term bo is the intercept , b1 is the slope of the regression line x is the input variable, e is the error term and y is the predicted value of response variable.

The slope b1 tells how change in the input causes changes in the output.

Screen Shot 2022 01 14 at 21.41.32 — Simple Linear Regression

A No-Code Approach to Linear Regression with Knime

Knime, the Konstanz Information Miner, is an open source data analytics, reporting and integration platform. Let make an example Using the the simple regression tree for regression in knime analytics platform.

Exploring the Dataset

The Iris flower data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

The Iris Dataset we are using looks like this:

Attribute Information:

1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class:
— Iris Setosa
— Iris Versicolour
— Iris Virginica

So, once the file is read into Knime using a File Reader node, we need to apply the first pre-processing step to the data. But we don’t need pre-processing because iris data set is a clean set. Our dataset is in a numerical format.

Train – Test Split

Finally, we have our dataset in a form that can be used for training a linear regressor and testing it. Before that, the last step we need to do is split the complete data into Train and Test data. To do so, we use Partitioning node. In its configuration, we specify to split the data Stratified sampling with 80 % as our train data and the remaining as our test data.

Training and Testing the Model

Knime provides a Linear Regression Learner and Regression Predictor node for creating a Linear Regression Learner and Predictor. It learns a single regression tree. We feed the iris train data set from partitioning node to the Learner node, and it produces a Predictor Model. You can see below.

Then, we feed the output model and Test data set to the Predictor node that churns out the predicted values for iris data set class score.

Evaluating the Model

We are using following metrics for evaluating the linear regression model:

R-Square value
Mean Absolute Error
Mean Square Error
Root Mean Square Error

All these metrics measure how much the predicted value deviates from the actual values. We can directly calculate these metrics using Numeric Scorer that takes the predicted feature values and actual feature values as input and produces the metrics. This Numeric Scorer node computes statistics between the a numeric column’s actual values and predicted values.

We see that our model has an R-Square value of 88.3 % which means that 88.3 % of our iris dataset falls around the regression line created by our model.

Line Plot

The final step is visualization part, we are using knime’s Line Plot nodes to draw a line plot to visualize the performance of the simple regression tree. Using line plot nodes, plots the numeric columns of the input table as lines.

when you want view Simple Regression Tree Learner nodes export you can ese the tabulated form of decision tree.

When you want to view Simple Regression Tree Predictor it displays the predicted value as output from on the testing data set.

Linear Regression Usecase

The following can be some of the best usecase of Linear Regression:

Given some demographic description of a person, predict their monthly income.
Linear regression can also be used to analyze the marketing effectiveness, pricing and promotions on sales of a product.
Given some description of a house(bathrooms, condition, view, floors), predict its cost.
The firm can use linear regression for, If a company, wants to know if the funds that they have invested in marketing a particular brand has given them substantial return on investment.
Conducting a linear analysis on the sales data with monthly sales, the company could forecast sales in future months.
Impact of rainfall amount on number fruits yielded
Linear regression impact of product price on number of sales

Thanks for reading. Click for other topics related to KNIME.

Zeynep Küçük

1 371 3 minutes read