OmniTrader Forum : Portfolio Simulation Mode : Deleting Thread Guidelines for Backtesting, Tuning and Evaluating

Jim Dean

Sage

Posts: 3022

Joined: 9/21/2006
Location: L'ville, GA

Subject : RE: Guidelines for Backtesting, Tuning and Evaluat
Posted : 6/23/2010 10:05 PM
Post #20858 - In reply to #20857

Here is the second part of my general guidelines re testing, tweaking and evaluating your Strategy, when it comes to:

Selection of Focus Lists ... for backtesting:

The prior post dealt with how filtering should be done for daily use, and emphasized the importance of using that same filtering (in the Filter Block) for backtesting.

But there is another VERY IMPORTANT issue here ... even WITH the same filters used, you need to avoid "CURVE FITTING" effects as you "tweak" your strategy. This can be done through two mechanisms ... BOTH should be used ... one related to Focus Lists and the other related to Time Periods.

Let's say that you start with a massive universe of 4500 symbols, and your OmniScript technical filters have reduced that list to 500 candidates. The question is, when you are using Strategy Wizard (an excellent addon to purchase!) to tweak the param's of your Strategy, or when you are using Portfolio Simulator to determine the best position-sizing rules, should you use that same dynamic list of (approx) 500 symbols, for all those "development" runs?

The answer is simple. NO.

If you use the same list, even if it's dynamic, then your SW and PS experiments will naturally bias themselves to enhance the "good" trades for those symbol-sets, and to minimize the "bad" trades ... during whatever historical period you have chosen.

So, after extensive fiddling around, you find that you get a soopa-doopah equity curve, with low drawdown and reasonable numbers of trades ... when in fact you've actually (to some degree) "fit" the way THOSE STOCKS performed during that time.

The solution is simple. Use different lists for the testing/tweaking activities than you do to "prove" your conclusions are robust and viable for as-yet-unknown FUTURE situations.

And, using the power of OmniScript, it's actually pretty easy to come up with the alternative lists. Here's how:

Let's say that you have a dynamic-filtered list of 500 symbols, and you want to create 5 "subsidiary" sets of 100 symbols apiece to use during testing. I suggest that you reserve two of these five for the "fiddling" process, and the other three for "proof" testing ... equivalent to the idea of "forward" testing.

So ... how to chose the five sets? Without introducing some kind of survivorship bias in doing so? Answer ... use the ALPHABET. I'll describe how this is done in OmniScan - can be done manually but a lot easier with OScan's tools.

1. create your custom list of 4500 symbols by putting SP400+500+600, Optionable, and Russell 3000 lists into the Starting Population box. I call this list the SPOR, btw.

2. create a TECHNICAL FILTER formula that has this general form:
Left(Symbol,1) >= "A" and Left(Symbol,1) <= "C"
This formula extracts the first letter of the symbol, and checks if that letter is in the range A to C ... the only symbols that "pass" this filter are ones that begin with A, B or C. Of course you can adjust the letters to be whatever you want.

3. ADJUST the boundary letters in the filter until OScan tells you the final list has about 100 symbols in it ... let's say that A-D was the result. Write that down for future reference and change the filter to start with the next letter "E", and search for the end-letter to get a second list of 100 symbols. Repeat until you have five lists of approximately equal size. For instance, they migh be bounded by the letters A-D, E-H, I-P, Q-T, and U-Z. Note that for all intents and purposes this is a RANDOM selection.

Pick one of these, always the same one, for work with SW. Once you think you've got something good from SW re the param's, do a simple forward test (no adjustments or training allowed!) using one of the other lists (always the same one). Then, once your SW work is done, use a THIRD list for your testing and tweaking in PS (these tests use the param's you chose from the SW devel) ... and use a FOURTH list to forward-test those conclusions. Reserve the fifth list for other stuff you come up with to try.

If you follow this method, or something similar to it, you will be ELIMINATING "curve fitting" relative to symbol-specific performance, for all intents and purposes.

Also note that you can CHANGE the lists, if you want them to be more than 100 symbols each. If you wanted to split the 500 in half, yet still do the process I described with four DIFFERENT lists, you could make the four filters be:

List one: Left(Symbol,1) <= "M"
List two: Left(Symbol,1) >= "N"
List three:
Left(Symbol,1) <= "F" OR
(Left(Symbol,1) >= "N" and Left(Symbol,1) <= "T")
List four:
(Left(Symbol,1) >= "G" and Left(Symbol,1) <= "M") OR
Left(Symbol,1) >= "N"
List five:
Left(Symbol,1) = "B" OR
Left(Symbol,1) = "D" OR
Left(Symbol,1) = "F" OR
Left(Symbol,1) = "H" OR
Left(Symbol,1) = "J" OR
Left(Symbol,1) = "L" OR
Left(Symbol,1) = "N" OR
Left(Symbol,1) = "P" OR
Left(Symbol,1) = "R" OR
Left(Symbol,1) = "T" OR
Left(Symbol,1) = "V" OR
Left(Symbol,1) = "X" OR
Left(Symbol,1) = "Z"

... a nearly infinite number of combo's are possible.

This filter would go into the FILTER BLOCK when testing.

Again ... if you haven't "heard of this method" before ... well, that might explain why so many "black box" soopa doopa systems are out there in the trading world, that don't seem to "pan out" in real life.