Skip to main content

Simple linear regression

Simple linear regression

 Purpose: To determine the relationship between IV and DV and to predict the value of the dependent variable (Y) based on the value of the independent variable (X).
Requirement:

Scales of measurement for variables: DV -interval or ratio IV -interval or ratio

Descriptive:

Steps:

·         Create a scatter plot graph to identify the line of best life, the purpose of the identity the trend of the data and predict the future.
·         Identify the intercept and slope of the regression line.

Inferential steps:

Steps:

1.       Conduct F test to predict equation and t-test to test the slop. Th purpose of the test is to find a relationship between IV and DV.
2.       Identify IV and DV
3.       State HO and HA
4.       Set confidence level
5.       Report F and Sig-F
6.       Decision: Sig-t is less than or equal to alpha reject the hypothesis and Sig-t greater than alpha fail to reject the hypothesis.

Types of relationships

Before proceeding, we must clarify what types of relationships we won’t study in this course, namely, deterministic (or functionalrelationships. Here is an example of a deterministic relationship.

fahrenheit vs celsius plot
Note that the observed (xy) data points fall directly on a line. As you may remember, the relationship between degrees Fahrenheit and degrees Celsius is known to be:
Fahr =95Cels+32
That is, if you know the temperature in degrees Celsius, you can use this equation to determine the temperature in degrees Fahrenheit exactly.
Here are some examples of other deterministic relationships that students from previous semesters have shared:
  • Circumference = π × diameter
  • Hooke’s Law: Y = α + βX, where Y = amount of stretch in a spring, and X = applied weight.
  • Ohm’s Law: I = V/r, where V = voltage applied, r = resistance, and I = current.
  • Boyle’s Law: For a constant temperature, P = α/V, where P = pressure, α = constant for each gas, and V = volume of gas.
For each of these deterministic relationships, the equation exactly describes the relationship between the two variables. This course does not examine deterministic relationships. Instead, we are interested in statistical relationships, in which the relationship between the variables is not perfect.
Here is an example of a statistical relationship. The response variable y is the mortality due to skin cancer (number of deaths per 10 million people) and the predictor variable x is the latitude (degrees North) at the center of each of 49 states in the U.S. (skincancer.txt) (The data were compiled in the 1950s, so Alaska and Hawaii were not yet stated. And, Washington, D.C. is included in the data set even though it is not technically a state.)

skin cancer vs state latitude plot

You might anticipate that if you lived in the higher latitudes of the northern U.S., the less exposed you’d be to the harmful rays of the sun, and therefore, the less risk you’d have of death due to skin cancer. The scatter plot supports such a hypothesis. There appears to be a negative linear relationship between latitude and mortality due to skin cancer, but the relationship is not perfect. Indeed, the plot exhibits some “trend,” but it also exhibits some “scatter.” Therefore, it is a statistical relationship, not a deterministic one.

Some other examples of statistical relationships might include:

  • Height and weight — as height increases, you’d expect the weight to increase, but not perfectly.
  • Alcohol consumed and blood alcohol content — as alcohol consumption increases, you’d expect one’s blood alcohol content to increase, but not perfectly.
  • Vital lung capacity and pack-years of smoking — as the amount of smoking increases (as quantified by the number of pack-years of smoking), you’d expect lung function (as quantified by vital lung capacity) to decrease, but not perfectly.
  • Driving speed and gas mileage — as driving speed increases, you’d expect gas mileage to decrease, but not perfectly.




Comments

Popular posts from this blog

Frequency Distribution and Data Presentation

A  frequency distribution is a table that displays the frequency of various outcomes in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of values in the sample. Organizing Data. 1.Arrange data into an array 2. Decide on the number of classes( k) 3. Calculate the class interval 4. Prepare a tally sheet There are two types of Frequency:  Categorical frequency distribution. Grouped frequency distribution. Categorical Frequency Distributions The categorical frequency distribution is used for data that can be placed in specific categories or represent values of a qualitative variable. Grouped Frequency Distributions: When the data are numerical and their range is large, the data must be grouped into classes that are more than one unit in length. Histogram The histogram is an accurate graphical representati...

Mean, Median and Mode

Mean: A simple or arithmetic average of a range of values or quantities, computed by dividing the total of all values by the number of values. For example, the mean of 1, 2, 3, 4, and 5 is (15 ÷ 5) = 3. It is the most common and best general purpose measure of the mid-point (around which all other values cluster) of a set of values, but is prone to distortion by the presence of extreme values and may require the use of a measure of distortion (such as mean deviation or standard deviation). Also called arithmetic mean. Example 1: What is the Mean of these numbers? 6, 11, 7 ·          Add the numbers:  6 + 11 + 7 = 24 ·          Divide by  how many  numbers (there are 3 numbers):  24 / 3 = 8 Median: Value or quantity that falls halfway between a set of values arranged in an ascending or descending order. When the set contains an odd number of values, the median value is exac...

Probability

Probability  is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1, where, loosely speaking,0 indicates impossibility and 1 indicates certainty. The higher the probability of an event, the more likely it is that the event will occur. A simple example is the tossing of a fair (unbiased) coin. Since the coin is fair, the two outcomes (“heads” and “tails”) are both equally probable; the probability of “heads” equals the probability of “tails”; and since no other outcomes are possible, the probability of either “heads” or “tails” is 1/2 (which could also be written as 0.5 or 50%). Types of probability 1) Classical Probability – All sample points have equal chances of the event to happen. 2) Relative frequency- The probability of single data compared to the whole data i.e. possible event to happen relative to all the possible outcomes. 3) Subjectable -Individual, personal judgment to say the probability of t...