Linear Rregression

.pdf

School

Webster University *

*We aren’t endorsed by this school

Course

513

Subject

Marketing

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by Count_Discovery_PolarBear14

1 Regression Analysis LEARNING OBJECTIVES 1. Explain simple and multiple linear regression models. 2. Define a predictive regression model. 3. Differentiate between accuracy measures for predictive performance. 4. Assess predictive regression performance. 5. Discuss modeling of categorical variables. 6. Discuss model independent variable selection. A large retail store would like to understand the impact of email promotions. To do this, the marketing analyst must predict sales amounts from customers that receive promotions by email. The store manager selected a sample of customers in the store’s database and sent them promotional materials via email. Transaction data was then collected on the customers that made purchases. The transaction data included the amount spent by the customers responding to the promotional campaign, as well as other variables such as the last time they made a purchase from the store; their age, gender, and income level; and the day of the week. How can the store manager use this data to identify which variables are most likely to predict the purchase amount of a returning customer? To answer this question, the store manager might use multiple regression. But what is multiple regression, and how do we determine which variables to examine? In the following pages, you will learn how to select variables and use regression models for predictions. 1. What Is Regression Modeling? Regression modeling captures the strength of a relationship between a single numerical dependent or target variable, and one or more (numerical or categorical) predictor variables. For example, regression modeling could predict customer purchase spending based on the email promotion and income level. The variable being predicted is referred to as the dependent or target variable (Y) . The variables used to make the prediction are called independent variables (X) (also referred to as predictors or features ). The regression model in this example has a single numerical dependent variable (purchase amount) and two independent variables (email promotion and income level).

2 1- Example Independent and Dependent Variables Using a regression model, we can identify which features (email promotion or income level) are more important in predicting customer purchase amount. Once the regression model has determined that one or more independent variables is a useful predictor, the next step is to identify which independent variable is the strongest predictor, then the second strongest, and so on, if there are more than two independent variables. In linear regression , a relationship between the independent and dependent variables is represented by a straight line that best fits the data. The goal of the regression line is to minimize the distances between actual (observed) points (blue dots) and the regression line (red line), or what we refer to as error of prediction as shown in Exhibit 2 . Exhibit 2 Straight Line Linear Model

3 Simple Linear Regression Simple linear regression is used when the focus is limited to a single, numeric dependent variable and a single independent variable. In Exhibit 1 , simple linear regression was used to examine the relationship between the size of homes (independent variable/horizontal axis) and the sales price (dependent variable/vertical axis). For example, in analyzing the effect of home size in square feet ( x ) on home sale price ( y ), a marketing analyst would propose home sales price as the target (dependent) variable and home square feet as the predictor (independent) variable. Exhibit 3 displays the simple linear regression model equation and describes each value. Exhibit 3 Simple Linear Regression Model In this equation, b 0 describes the estimated y -intercept and b 1 describes the slope of the regression line (red line in Exhibit 2 ). The estimated y-intercept is the point at which the linear regression line crosses the vertical axis (see Exhibit 4 ). Exhibit 4 y-Intercept ŷ = b 0 + b 1 x + e ŷ = dependent variable b 0 = The intercept is the value of ŷ that we expect when x is zero b 1 = slope of the line that explains the change in ŷ when x changes by a single unit of measure. x = independent variable e = error in ŷ for observations i . Accounts for the variability that is not explained by the linear relationship between x and ŷ .

4 The ŷ is an estimate of the average value of y for each value of x . The error term ( e ) is a measure of the difference between the actual (observed) outcome and the predicted value based on the model estimation. The error term ( e ) can also be thought of as the error in using regression to predict the dependent variable. A regression line always has an error term because independent variables are never perfect predictors of the dependent variables. Error terms tell us how certain we are in the estimate. Therefore, the larger the error the less certain we are about the estimation. The information displayed in Exhibit 2 is based on 30 observations of home sales in King County, Washington. The relationship between the straight line and the dots shows the best fit of the line to the blue dots. The best fit of the straight line is measured by using the actual vertical distances of the blue dots from the red regression line. The vertical distance values of the blue dots from the red line are first squared. After the distances are squared, you sum the distance values to identify the position of the red line so the errors in calculating the squared distances are minimized. That is, the best fit of the straight line to the dots is obtained when the sum of the squared distance values compared to the actual distance values produces the smallest estimate of the errors. The regression equation for the best fit of the line for the 30 observations is: ŷ = –11,696 + 262.13 Home Sq. Feet With this simple regression model, the analyst can use a new home square footage measure to estimate its price. For example, a home with square footage of 1,100 would be expected to have a price of $276,647. But many other variables may also affect the price of a house, such as the number of bedrooms, recent renovations, and so on. To determine if these additional independent variables can predict the dependent variable price, we must use multiple regression models. Multiple regression models also have a single numerical dependent variable, but in addition, they have several independent variables to improve your predictions. What other variables might also affect the price of the home a purchaser is willing to pay? Examples of variables that might help you to predict include the quality of the workmanship in the home, customer’s income level, number of past home shopping visits in the past 24 months, number of homes for sale in a particular area, and so forth. How is the regression line estimated? The most common process is to estimate the regression line using the Ordinary least squares (OLS) method, which was summarized earlier. OLS minimizes the sum of squared errors. In this case, error represents the difference between the actual values and the predicted values of the distances between the straight line and the dots around the line. The OLS method is a good way to determine the best fit for the set of data. The OLS process calculates the weights for the b 0 and b 1 and uses them to estimate the dependent variable. This procedure minimizes the errors and, ultimately, the differences between the observed and predicted values of the dependent variable.

5 Whether the regression model includes one independent variable (simple regression) or several independent variables (multiple regression), the process estimates the value for only a single dependent variable. As noted earlier, the straight line is located in a position that minimizes the distances from the actual points to the estimated line. For example, Exhibit 5 shows the difference between a line of best fit (a) and a random line (b). The goal is to find the line that produces the minimum sum of squared errors in estimating the distances from the points to the straight line. Exhibit 5 Line of Best Fit Multiple Linear Regression Multiple regression is used to determine whether two or more independent variables are good predictors of the single dependent variable. For example, multiple regression could be used to determine whether several independent variables can improve the prediction of the single dependent variable sales. That is, what are some drivers (independent variables) that predict sales? In most instances, a marketing analyst could easily identify many possible independent variables that might predict the dependent variable sales such as promotion, price, season, weather, location and many more. Evaluating the Ability of the Regression Model to Predict A key metric to evaluate whether the regression model is a good predictor of the dependent variable is R 2 . The R 2 measures the amount of variance in the dependent variable that is predicted by the independent variable(s). The R 2 value ranges between 0 and 1, and the closer the value is to 1, the better the prediction by the regression model. When the value is near 0, the regression model is not a good predictor of the dependent variable. A good fit of the straight line to the dots produces a tighter model (dots are consistently closer to the line) and a higher R 2 , as shown in Exhibit 6 . A poor fit of the straight line to the dots would produce a looser model than in Exhibit 6 (dots are consistently further away from the line) and the R 2 would be much lower. The R 2 goodness of fit measure is important for descriptive and explanatory regressions. When predicting new observations, however, there are other evaluation criteria that are important.

6 Exhibit 6 Regression Model Predicting R 2 P R A C T I T I O N E R C O R N E R Jessica Owens | Integrated Marketing Manager at Ntara Jessica Owens is an integrated marketing manager at Ntara, a digital transformation agency. A marketing strategist and consultant, she leverages customer data to inform strategic marketing decisions, build better stories, and advocate new ideas to clients that lead to better customer experiences and increased sales. She has worked with brands like Simmons Beautyrest, Cardinal Innovations Healthcare, and a variety of global B2B manufacturers in chemical intermediates, flooring, transportation, and more. Jessica strives to help her clients understand, embrace, and connect data from all channels.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version