Register for a FREE 2-hour workshop!
Optionetics Commentary

TECHNICAL TOOLBOX: Basic Statistics for Technical Analysts


Change text size
  • Email This Article to a FriendEmail This Article
  • Printer Friendly PagePrint This Article
  • RSS FeedSubscribe


Clare White, Optionetics.com
July 14, 2005


The stock market represents in a statistician’s dream on one hand and a nightmare on the other. The “dream” is endless amounts of data that can be grouped, tested, analyzed, manipulated, retested… and so on and so on. This is part of the nightmare as well. While it is widely believed that market data and statistics analysis will not lead an analyst to predictions of market trajectories, there remain important statistical applications for traders.

We’ll focus on two areas in this series—basic statistics to analyze relationships between data sets and statistics that can be used to determine if the results of a system or a hypothesis are significant. We’ll also look at some more advanced techniques being done by some market analysts. At the bottom of each article, we’ll provide foundational material for reference and review.

Building Blocks

There are three primary tools we’ll discuss first. They include measures of central tendency (the average is one common measure), variance and standard deviation. Excel formulas for each are provided here along with definitions and formulas. If you don’t love math, stay focused on the meaning of the terms we’re describing.

Common measures of central tendency include the following (also see example that follows):

Measure

Description

Approach

Note

Mean (a.k.a. “average”)

The midpoint of a data set; this result may not actually be a value contained in the set.

Add up all of the data points and divide by the total number of data points in the group.

The mean includes extreme high or low data points, a.k.a. “outliers”

Median

The middle value in a data series when grouped in ascending or descending order. If there is an even number of data points, it is the 2 middle values.

Identify the total number of data points in the series and arrange in ascending or descending order. For a data set with an odd number of values use the middle data point, for those with an even number of values use the two middle data points.

The median essentially removes extreme data points or outliers. There are 1 or 2 values possible for the median.

Mode

The data value that occurs most often.

Arrange the data in ascending or descending order. Determine the data value(s) that occur most frequently.

The mode is slightly unique and can have multiple values.

To better define our terms we’ll use the following sample data set, which is merely a collection of numbers, grouped in ascending order:

10 data points, {11, 12, 13, 14, 15, 15, 17, 17, 17 and 28}

Measure

Description

Approach

Excel Formula

Mean (a.k.a. “average”)

The midpoint of a data set; this result may not actually be a value contained in the data set.

11 + 12 + 13 + 14 + 15 + 15 + 17 + 17 + 17 + 28 = 159

Mean = 159 ¸10 = 15.9

 

= average(value1, value2, …), or

= average(range)

Median

The middle value in a data series that is grouped in ascending or descending order. If there is an even number of data points, it is the 2 middle values.

Since there are 10 data points, the median is represented by data values 5 and 6, which are 15 and 15.

Median = 15

 

= median(value1, value2, …), or

= median(range)

Mode

The data value that occurs most often.

The value 17 occurs 3 times while the value 15 occurs twice.

Mode = 17

 

= mode(value1, value2, …), or

= mode(range)

[Editor's note: the above table has been corrected as per Clare White's notation in Technical Toolbox: Data the Makes Sense, 7/21/05.]

Generally you’ll use mean values when analyzing data; however, there are times when it is appropriate to remove outliers. Usually this will happen when you have a data point that is suspect—when you feel the value is some how invalid. Keep in mind a data point that is extreme, is not necessarily invalid. There needs to be a reason to dismiss the value, i.e. bad print for a trade. In addition to consideration of a median value rather than mean value, there are other methods to remove outliers from calculations.

Terminology

In order to discuss variance and standard deviation, we need to define a few more terms. When we look at data sets, there are two general categories we can consider: dependent data points and independent data points. Another term for “data point” is variable—that is, a value that is changing (like the closing daily value for a stock).

These two terms, dependent and independent variables, are relative. For instance, let’s consider a data set that is comprised of wind velocities over a given day from one collection point. This data set includes variables; values that change. Consider a second set of variable data that records temperatures throughout the day at the same time wind velocities are recorded. Since changing temperatures will impact wind velocities, the temperature data represents independent variables and the wind velocities represent dependent variables for our example.

Now consider a second example that uses the same wind velocity data and a new data set: the speed of a sail boat on a lake where the wind velocity is being recorded. The wind velocity variables now represent our independent variable while sail boat speed represents the dependent variable. The speed of the sail boat is partially determined by wind speed.

Variance and Standard Deviation

We use the variance to measure how far away data values are from an expected value, with that expected value being the mean. The standard deviation is then used to describe how dispersed the data set is from this expected value. So variance and standard deviation describe characteristics of the whole data set in relation to the average value of that data set.

Consider these two data sets:

Set A: 10 data points, {11, 12, 13, 14, 15, 15, 17, 17, 17 and 28}, and
Set B: 10 data points, {11, 22, 23, 34, 45, 55, 57, 67, 77 and 88}

Given the description of variance and standard deviation, which set has the higher variance (data values that are further from the mean) and which has the higher standard deviation (greater dispersion from expected value)? Here are results for each data set, which are sample sets (not an entire population):

 

Sample Variance

Sample Standard Deviation

Set A

22.5

4.75

Set B

645.2

25.4

Excel Formula

= var(range)

= stdev(range)

[Editor's note: the above table has been corrected as per Clare White's notation in Technical Toolbox: Data the Makes Sense, 7/21/05.]

Even without having the actual formula for these two measurements, ideally you were able to select Set B based on its much larger range of data points. Here are the formulas when evaluating sample sets:

Sample Variance = Sum [(each value in set – the mean value)2]/ (# of values in set – 1)

Sample Standard Deviation = Square root of the sample variance

Homework

Determine the correct formulas for Mean, Median, Variance and Standard Deviation for your spreadsheet program. Learn how to export data from your chart package to a text file that is compatible with your spreadsheet program. Generally .txt or .csv files can be opened in a spreadsheet. The program will prompt you to select formatting, which is often comma or space delimited. Don’t be afraid to play around with it—if the data is not being properly filled into the spreadsheet program, close the file and try again.

To see the previous articles in this series, please click here.

Clare White
Staff Writer and Options Strategist
Optionetics.com ~ Your Options Education Site

Foundational Information

Measure of Central Tendency: Statistical lingo for a calculation used to determine a middle value for a data set. Such common measurements include means, median and mode.

Mean (average): The midpoint of a data set; this result may not actually be a value contained in the set.

Median: The middle value in a data series that is grouped in ascending or descending order. If there is an even number of data points, it is the 2 middle values.

Mode: The data value that occurs most often.

Variable: A data value that is changing (not constant). Wind velocity values collected throughout the day is an example of a variable.

Dependent Variable: A data point whose value is partially or fully determined by another specific set of data.

Independent Variable: A data point whose value is not partially or fully determined by another specific set of data. It generally describes the variable whose value impacts the dependent variable.

Formulas for Mean, Variance and Standard Deviation using data set {3, 3, 4 and 6}:

Mean = Sum (values in set) / # of values in your set = (3 + 3 + 4 + 6) / 4 = 4

Population Variance = Sum (each value in set – the mean value)2 / (# of values in set)

Population Variance = [(3 – 4)2 + (3 – 4)2 + (4 –4)2 + (6 – 4)2] / (4) = (1 + 1 + 0 + 4) / 4 = 1.5

Sample Variance = Sum (each value in set – the mean value)2 / (# of values in set – 1)

Sample Variance = [(3 – 4)2 + (3 – 4)2 + (4 –4)2 + (6 – 4)2] / (3) = (1 + 1 + 0 + 4) / 4 = 2.0

Population Standard Deviation = Square root of the population variance

Population Standard Deviation = Ö 1.50 = 1.23

Sample Standard Deviation = Square root of the sample variance

Sample Standard Deviation = Ö 2.0 = 1.41

A search of the article archives on Optionetics.com indicates the following:

Trading Statistics: 1 Article

To navigate to these articles go to Optionetics.com:

  1. Select the “Products & Services” tab,   
  2. Select “Articles” in the menu right below the tab.   
  3. Select “Article Archives” in the body of the web page,  
  4. Enter the words “Trading Statistics” in the blank field,  
  5. Select “Title” from the drop down menu,   
  6. Search.

 

 


  • Email This Article to a FriendEmail This Article
  • Printer Friendly PagePrint This Article
  • RSS FeedSubscribe
  
Optionetics, Inc. and optionsXpress, Inc. are affiliated companies under common ownership of optionsXpress Holdings, Inc. Optionetics and its affiliates, officers, employees, independent contractors, and former owners may receive compensation in connection with marketing efforts, may not be registered as a Broker-Dealer, Investment Adviser, with any state, or otherwise, and their materials, products and services may not be reviewed and/or approved. Further information is available here (http://www.optionetics.com/about/legal.asp). Optionetics.com is an educational portal of optionsXpress Holdings, Inc., providing content for educational and informational purposes only. optionsXpress Holdings, Inc. is not a broker/dealer. Investors need a broker to trade options, and must meet certain requirements. All securities, futures, and investments are offered to self-directed investors by optionsXpress, Inc. Member FINRA, SIPC, CBOE, ISE, ArcaEx, PHLX and NFA. All prices in USD unless noted otherwise. Copyright © 2010 optionsXpress Holdings, Inc.