TECHNICAL TOOLBOX: Basic Statistics for Technical Analysts
MOST POPULAR ARTICLES
- Kaeppel's Corner: Twisted VIXter
- Closing Wrap-Up, July 29
- Morning Watch, July 30
- Market Trends: Platinum Probability Bands
- Economic Watchdog, July 29
- Midday Action: July 29
- Morning Watch, July 29
- Growth Stock Swing Option: July 28, 2010
- Closing Wrap-Up, July 30
- Real-World Trading: Using Calendar Spreads in Sideways Markets, VII
- Real-World Trading: Using Calendar Spreads in Sideways Markets, VII
- Kaeppel's Corner: Twisted VIXter
- Real-World Trading: Using Calendar Spreads in Sideways Markets, Part VI
- Real-World Trading: Using Calendar Spreads in Sideways Markets, Part V
- Fundamental Focus: Insuring Your Portfolio
- Trading Calendar Spreads with Options
- REAL-WORLD TRADING: Five-Minute Success Formula
- Midday Action: July 30
- AU Editorial: Money and Holidays
- Hot Shots, July 30: Straddling a Breakout Performance from Within
- Economic Watchdog, July 29
- Market Trends: Platinum Probability Bands
- Midday Action: July 29
- Growth Stock Swing Option: July 28, 2010
- AU Market Review: Market Action
- Midday Action: July 28
- Real-World Trading: Using Calendar Spreads in Sideways Markets, VII
SPONSORED LINKS
July 14, 2005
The stock market represents in a statistician’s dream on one hand and a nightmare on the other. The “dream” is endless amounts of data that can be grouped, tested, analyzed, manipulated, retested… and so on and so on. This is part of the nightmare as well. While it is widely believed that market data and statistics analysis will not lead an analyst to predictions of market trajectories, there remain important statistical applications for traders.
We’ll focus on two areas in this series—basic statistics to analyze relationships between data sets and statistics that can be used to determine if the results of a system or a hypothesis are significant. We’ll also look at some more advanced techniques being done by some market analysts. At the bottom of each article, we’ll provide foundational material for reference and review.
Building Blocks
There are three primary tools we’ll discuss first. They include measures of central tendency (the average is one common measure), variance and standard deviation. Excel formulas for each are provided here along with definitions and formulas. If you don’t love math, stay focused on the meaning of the terms we’re describing.
Common measures of central tendency include the following (also see example that follows):
Measure | Description | Approach | Note |
Mean (a.k.a. “average”) | The midpoint of a data set; this result may not actually be a value contained in the set. | Add up all of the data points and divide by the total number of data points in the group. | The mean includes extreme high or low data points, a.k.a. “outliers” |
Median | The middle value in a data series when grouped in ascending or descending order. If there is an even number of data points, it is the 2 middle values. | Identify the total number of data points in the series and arrange in ascending or descending order. For a data set with an odd number of values use the middle data point, for those with an even number of values use the two middle data points. | The median essentially removes extreme data points or outliers. There are 1 or 2 values possible for the median. |
Mode | The data value that occurs most often. | Arrange the data in ascending or descending order. Determine the data value(s) that occur most frequently. | The mode is slightly unique and can have multiple values. |
To better define our terms we’ll use the following sample data set, which is merely a collection of numbers, grouped in ascending order:
10 data points, {11, 12, 13, 14, 15, 15, 17, 17, 17 and 28}
Measure | Description | Approach | Excel Formula |
Mean (a.k.a. “average”) | The midpoint of a data set; this result may not actually be a value contained in the data set. | 11 + 12 + 13 + 14 + 15 + 15 + 17 + 17 + 17 + 28 = 159 |
= average(value1, value2, …), or |
Median | The middle value in a data series that is grouped in ascending or descending order. If there is an even number of data points, it is the 2 middle values. | Since there are 10 data points, the median is represented by data values 5 and 6, which are 15 and 15. |
= median(value1, value2, …), or |
Mode | The data value that occurs most often. | The value 17 occurs 3 times while the value 15 occurs twice. |
= mode(value1, value2, …), or |
[Editor's note: the above table has been corrected as per Clare White's notation in Technical Toolbox: Data the Makes Sense, 7/21/05.]
Generally you’ll use mean values when analyzing data; however, there are times when it is appropriate to remove outliers. Usually this will happen when you have a data point that is suspect—when you feel the value is some how invalid. Keep in mind a data point that is extreme, is not necessarily invalid. There needs to be a reason to dismiss the value, i.e. bad print for a trade. In addition to consideration of a median value rather than mean value, there are other methods to remove outliers from calculations.
Terminology
In order to discuss variance and standard deviation, we need to define a few more terms. When we look at data sets, there are two general categories we can consider: dependent data points and independent data points. Another term for “data point” is variable—that is, a value that is changing (like the closing daily value for a stock).
These two terms, dependent and independent variables, are relative. For instance, let’s consider a data set that is comprised of wind velocities over a given day from one collection point. This data set includes variables; values that change. Consider a second set of variable data that records temperatures throughout the day at the same time wind velocities are recorded. Since changing temperatures will impact wind velocities, the temperature data represents independent variables and the wind velocities represent dependent variables for our example.
Now consider a second example that uses the same wind velocity data and a new data set: the speed of a sail boat on a lake where the wind velocity is being recorded. The wind velocity variables now represent our independent variable while sail boat speed represents the dependent variable. The speed of the sail boat is partially determined by wind speed.
Variance and Standard Deviation
We use the variance to measure how far away data values are from an expected value, with that expected value being the mean. The standard deviation is then used to describe how dispersed the data set is from this expected value. So variance and standard deviation describe characteristics of the whole data set in relation to the average value of that data set.
Consider these two data sets:
Set A: 10 data points, {11, 12, 13, 14, 15, 15, 17, 17, 17 and 28}, and
Set B: 10 data points, {11, 22, 23, 34, 45, 55, 57, 67, 77 and 88}
Given the description of variance and standard deviation, which set has the higher variance (data values that are further from the mean) and which has the higher standard deviation (greater dispersion from expected value)? Here are results for each data set, which are sample sets (not an entire population):
| Sample Variance | Sample Standard Deviation |
Set A | 22.5 | 4.75 |
Set B | 645.2 | 25.4 |
Excel Formula | = var(range) | = stdev(range) |
[Editor's note: the above table has been corrected as per Clare White's notation in Technical Toolbox: Data the Makes Sense, 7/21/05.]
Even without having the actual formula for these two measurements, ideally you were able to select Set B based on its much larger range of data points. Here are the formulas when evaluating sample sets:
Sample Variance = Sum [(each value in set – the mean value)2]/ (# of values in set – 1)
Sample Standard Deviation = Square root of the sample variance
Homework
Determine the correct formulas for Mean, Median, Variance and Standard Deviation for your spreadsheet program. Learn how to export data from your chart package to a text file that is compatible with your spreadsheet program. Generally .txt or .csv files can be opened in a spreadsheet. The program will prompt you to select formatting, which is often comma or space delimited. Don’t be afraid to play around with it—if the data is not being properly filled into the spreadsheet program, close the file and try again.
To see the previous articles in this series, please click here.
Clare White
Staff Writer and Options Strategist
Optionetics.com ~ Your Options Education Site
Foundational Information
Measure of Central Tendency: Statistical lingo for a calculation used to determine a middle value for a data set. Such common measurements include means, median and mode.
Mean (average): The midpoint of a data set; this result may not actually be a value contained in the set.
Median: The middle value in a data series that is grouped in ascending or descending order. If there is an even number of data points, it is the 2 middle values.
Mode: The data value that occurs most often.
Variable: A data value that is changing (not constant). Wind velocity values collected throughout the day is an example of a variable.
Dependent Variable: A data point whose value is partially or fully determined by another specific set of data.
Independent Variable: A data point whose value is not partially or fully determined by another specific set of data. It generally describes the variable whose value impacts the dependent variable.
Formulas for Mean, Variance and Standard Deviation using data set {3, 3, 4 and 6}:
Mean = Sum (values in set) / # of values in your set = (3 + 3 + 4 + 6) / 4 = 4
Population Variance = Sum (each value in set – the mean value)2 / (# of values in set)
Population Variance = [(3 – 4)2 + (3 – 4)2 + (4 –4)2 + (6 – 4)2] / (4) = (1 + 1 + 0 + 4) / 4 = 1.5
Sample Variance = Sum (each value in set – the mean value)2 / (# of values in set – 1)
Sample Variance = [(3 – 4)2 + (3 – 4)2 + (4 –4)2 + (6 – 4)2] / (3) = (1 + 1 + 0 + 4) / 4 = 2.0
Population Standard Deviation = Square root of the population variance
Population Standard Deviation = Ö 1.50 = 1.23
Sample Standard Deviation = Square root of the sample variance
Sample Standard Deviation = Ö 2.0 = 1.41
A search of the article archives on Optionetics.com indicates the following:
Trading Statistics: 1 Article
To navigate to these articles go to Optionetics.com:
- Select the “Products & Services” tab,
- Select “Articles” in the menu right below the tab.
- Select “Article Archives” in the body of the web page,
- Enter the words “Trading Statistics” in the blank field,
- Select “Title” from the drop down menu,
- Search.
© Copyright 1995-2010 Optionetics. All rights reserved. This material is for personal use only. Republication and re-dissemination, including posting to newsgroups, is expressly prohibited without the prior written consent of Optionetics. Optionetics is a registered trademark of Optionetics, Inc.

