|Current location||Thread information|
OmniTrader 2017 Upgrade Forums
Portfolio Simulation Mode
Guidelines for Backtesting, Tuning and Evaluating
Last Activity 6/13/2018 11:40 AM
10 replies, 2106 viewings
|Printer friendly version|
Location: L'ville, GA
OmniTrader offers superb tools for developing complete, robust Strategies that we can use to pattern our trading from. All the "steps" of the process, from initial Focus List selection, to detailed Trade Plan execution, are part of the process. There are additional tools such as Seasonality and Group Trader to help us know which strategies and which focus lists might be most appropriate as time moves on.
There are two complimentary tools - Strategy Wizard and Portfolio Simulator, which help us optimize a given approach by means of testing and tweaking the details of the Strategy ... and just as (or more) importantly the allocation method we use for position-sizing. These both have a "starting point" ... the core Strategy, a focus-list, and a period of time (actually, two periods of time).
I'd like to provide some very general advice and explanation here - others may want to follow up with more detailed information. Sometimes, we can fall into the trap of "just trust the computer" mode. We see a button, we push it, we look at the results, we try another button, and so forth.
But we often shortchange the most important tool - the one that makes the difference between this process being a narcissistic hobbyhorse waste of time versus something that truly helps us build understanding, confidence, greater profitability, and more consistent performace ... that most important tool is our *brain*.
Nutshell about all that will follow:
If you don't understand how a tool works, don't use it!
If you do understand how it works, use it wisely!
[Edited by Jim Dean on 6/23/2010 5:52 PM]
Location: L'ville, GA
The three starting-point components for testing and tweaking are:
This post will address the Strategy.
Since most of the OT package and most of the addons are dealing with Strategy alternatives, it would be a bit much to try to discuss that component here. However, please review the two nutshell statements at the end of the prior post. The step regarding Strategy choice and use that most people overlook is the LEARNING and UNDERSTANDING process.
It's so tempting to get that shiny new gizmo (ok so it's a boring flat CD that you load in your computer and put away in a dusty shelf somewhere, but humor me here :~) ... and immediately hook it up by just opening up the Strategy list in ToDo and putting checks in the boxes of whatever shiny new Strategy variations it contains, and presuming, based on the glowing reports of the marvelous marketing brochure, that by using it you'll be rich beyond your wildest dreams within a year! Whoopee!
But ... it doesn't work that way. If you have not learned this yet, I hope that you are not trading with any money that is important to you.
Before you can effectively use a new Strategy, you need to STUDY IT. Start off by (gasp) reading the manual - no, the brochure is not the manual ;~) ... it's found by pressing the Help button. Read it through fully once, then go back over it slowly, and play with the individual features that it discusses. Play with it in a vacuum ... don't activate other Strategies or do any major surgery on it. Not yet, anyways :~)
One of two things will come out of this process, if you do a thorough job of it (more likely, a mixture of the two):
1. you will, by fiddling around with it in a controlled way, gradually come to understand not just how it's supposed to work in an ideal situation, but how it really DOES work. This is done primarily by looking at CHARTS and the VOTELINE ... lots and lots of charts, playing with parameters as you go. Seemingly-odd things might happen ... that should impel you to dig even deeper.
... OR ...
2. you will generate a list of QUESTIONS - hopefully ones that pinpoint areas of confusion ... simple things like definitions of words, or involved stuff like interactions of input variables. Questions are GOOD. Bring your specific questions to the Forum ... the (eventual) answers that will come most likely will assist many other folks as well.
The goal is to not only intellectually understand how the strategy works, but to develop an appreciation and gut feel for how it performs in a wide variety of different market conditions.
Take NOTES as you go.
Anything less than considering at least a couple hundred charts is probably not enough.
And, none of this has anything to do with backtest reports, optimization of parameters, or interaction with other strategies.
Details about what to look for and what to play with vary widely for different strategies and for different components of the strategy. The best general guideline is to fiddle with ONE THING AT A TIME, till you understand it, then move on to the next fiddle-factor.
This process actually can be fun! As long as your goal is to UNDERSTAND things, regardless of whether they work out well or not.
... I'll continue with general comments about the other two areas, later this evening ...
[Edited by Jim Dean on 6/23/2010 5:52 PM]
Location: Lewiston, Maine 04240
I totally agree with what you are saying here.
Along with My Personal Strategies ( 9 looking for Tops and 9 looking for Lows ) I must look at 300 charts along with 4 different indicators each and every day !
But, these efforts often pay GREAT DIVIDENDS, so a little work now will result in much better stock picks and therefore better trades !
Just my 2 cents.
Location: L'ville, GA
The second starting-point component affecting testing and tweaking is:
the Focus-List ... this post is part one of two, about them:
The first point is simple ... use a focus-list that is representative of what you intend to trade, when you TEST the strategy.
However, be very careful to AVOID using the SAME Focus-List for your TWEAKING process. The danger is that you may "over-fit" the strategy to a particular list.
Much has been said in other posts about "survivorship bias" ... that is, the danger of using symbols in a present-day index such as the SP500 as the basis for testing and tweaking Strategies on past data, for example over the past ten years.
There is definitely a bias ... how much of a bias is arguable ... but it's definitely true that by using a list of stocks that you know, TODAY, are "high class" ... during years BEFORE the time that they had already been identified as a part of that "class" ... that you are carrying that "future" information back into the past.
I can't take the time to discuss this in depth here, but suffice it to say that this is by no means a "cut and dried" issue. For instance, that very same action might bias AGAINST short trades or a strategy that is supposed to work with vacillating stocks. Furthermore, if the hold time is only a few days or a few weeks, then one of the basic arguments of Technical Analysis is that fundamentals don't matter for such short hold times ... and existence in the SP500 certainly has more to do with fundamentals than technicals.
But, I digress. My point has NOTHING to do with survivorship bias ... although that is a factor which the solution below will serve to ameliorate to some degree, as well.
Too often, a focus list is just based on some very simplistic choice ... "hmm, let's use the SP500 ... or maybe just the SP100, to save time ... or maybe the Russell 1000, to be more exhaustive". The choice is usually made out of CONVENIENCE, rather than a carefully-considered rationale about what KINDS of stocks are best for THIS STRATEGY.
I am a strong, STRONG advocate for doing extensive, pattern-based PRE-FILTERING of stocks, based on price and volume data ONLY ... starting from as big a universe as possible to help eliminate survivorship issues, AND to produce more viable candidates.
How does this work? Again, way way too much to discuss here (we allocate a full multi-session Topic on this in AOTC ;~) ... but let me give you some examples:
1. You want stocks that have RECENTLY DEMONSTRATED adequate liquidity to support the entry and exit of the POSITION SIZE that you intend to use. An OmniScript forumula can be written to filter for this.
2. You want stocks that have RECENTLY DEMONSTRATED reasonable reward-potential (net price movement over typical hold-time) in comparison to the risk-levels (volatility) during that time. An OmniScript forumula can be written to filter for this.
3. IF you are playing "trends", you want stocks that have RECENTLY DEMONSTRATED a tendency to move smoothly as opposed to in big jumps, which rarely can be predicted ahead of time or captured. An OmniScript forumula can be written to filter for this.
4. You want stocks whose RECENT price/share is within an appropriate range to meet your capital limitations, allowing you enough shares to have some flexibility re partial exits or scaling-in, but not requiring huge numbers of shares (making slippage an issue) in order to obtain your desired financial position in that security. An OmniScript forumula can be written to filter for this.
5. You want stocks that have RECENTLY DEMONSTRATED, in the bar-periods you plan to trade, a tendency not to have "long tails" vs body-size, especially in relation to the necessary net-move you are looking for. An OmniScript forumula can be written to filter for this.
... Of course there are more considerations, but these are some biggies.
Each of these can be addressed by an OmniScript filter ... do-able within OT Basic ... but the filters can be more robust (and easier to work with) if you own OT Pro - this allows them to be saved as small compiled functions.
The filters should be used in OmniScan for daily selection of the Focus List you use ... and should be used in the Filter Block for any backtesting.
If you start with a VERY LARGE list ... such as all Optionable stocks ... or the sum of the Russell 3000 + SP1500 + Optionable ... that collection of about 4500 stocks can be filtered down SOLELY by the technical formulae above ... if not enough symbols appear - maybe you should not be trading in that market! ... if too many symbols appear - maybe you should be tightening the parameters of the filters more!
The point is, these filters (in the Filter Block) are applied EVERY DAY in the backtesting runs for THAT DAY ... there is no future bias. All of this is possible WITHOUT OmniScan ... tho OScan will significantly speed up your day-to-day USE of this method.
If this all sounds a bit out of the ordinary ... well, it is. Not many books or articles are written about it. I consider this robust pattern-prefiltering to be a significant "trading edge".
So - the point is ... if your focus list is DYNAMIC in real-life, based on a huge starting point that is logically filtered, then as long as your backtest uses the same logic, you will eliminate a lot of "survivor bias" issues ... AND you will find that your trades work more profitably!
... the next post deals with a second important issue re working with focus lists ...
Location: L'ville, GA
Here is the second part of my general guidelines re testing, tweaking and evaluating your Strategy, when it comes to:
Selection of Focus Lists ... for backtesting:
The prior post dealt with how filtering should be done for daily use, and emphasized the importance of using that same filtering (in the Filter Block) for backtesting.
But there is another VERY IMPORTANT issue here ... even WITH the same filters used, you need to avoid "CURVE FITTING" effects as you "tweak" your strategy. This can be done through two mechanisms ... BOTH should be used ... one related to Focus Lists and the other related to Time Periods.
Let's say that you start with a massive universe of 4500 symbols, and your OmniScript technical filters have reduced that list to 500 candidates. The question is, when you are using Strategy Wizard (an excellent addon to purchase!) to tweak the param's of your Strategy, or when you are using Portfolio Simulator to determine the best position-sizing rules, should you use that same dynamic list of (approx) 500 symbols, for all those "development" runs?
The answer is simple. NO.
If you use the same list, even if it's dynamic, then your SW and PS experiments will naturally bias themselves to enhance the "good" trades for those symbol-sets, and to minimize the "bad" trades ... during whatever historical period you have chosen.
So, after extensive fiddling around, you find that you get a soopa-doopah equity curve, with low drawdown and reasonable numbers of trades ... when in fact you've actually (to some degree) "fit" the way THOSE STOCKS performed during that time.
The solution is simple. Use different lists for the testing/tweaking activities than you do to "prove" your conclusions are robust and viable for as-yet-unknown FUTURE situations.
And, using the power of OmniScript, it's actually pretty easy to come up with the alternative lists. Here's how:
Let's say that you have a dynamic-filtered list of 500 symbols, and you want to create 5 "subsidiary" sets of 100 symbols apiece to use during testing. I suggest that you reserve two of these five for the "fiddling" process, and the other three for "proof" testing ... equivalent to the idea of "forward" testing.
So ... how to chose the five sets? Without introducing some kind of survivorship bias in doing so? Answer ... use the ALPHABET. I'll describe how this is done in OmniScan - can be done manually but a lot easier with OScan's tools.
1. create your custom list of 4500 symbols by putting SP400+500+600, Optionable, and Russell 3000 lists into the Starting Population box. I call this list the SPOR, btw.
2. create a TECHNICAL FILTER formula that has this general form:
Left(Symbol,1) >= "A" and Left(Symbol,1) <= "C"
This formula extracts the first letter of the symbol, and checks if that letter is in the range A to C ... the only symbols that "pass" this filter are ones that begin with A, B or C. Of course you can adjust the letters to be whatever you want.
3. ADJUST the boundary letters in the filter until OScan tells you the final list has about 100 symbols in it ... let's say that A-D was the result. Write that down for future reference and change the filter to start with the next letter "E", and search for the end-letter to get a second list of 100 symbols. Repeat until you have five lists of approximately equal size. For instance, they migh be bounded by the letters A-D, E-H, I-P, Q-T, and U-Z. Note that for all intents and purposes this is a RANDOM selection.
Pick one of these, always the same one, for work with SW. Once you think you've got something good from SW re the param's, do a simple forward test (no adjustments or training allowed!) using one of the other lists (always the same one). Then, once your SW work is done, use a THIRD list for your testing and tweaking in PS (these tests use the param's you chose from the SW devel) ... and use a FOURTH list to forward-test those conclusions. Reserve the fifth list for other stuff you come up with to try.
If you follow this method, or something similar to it, you will be ELIMINATING "curve fitting" relative to symbol-specific performance, for all intents and purposes.
Also note that you can CHANGE the lists, if you want them to be more than 100 symbols each. If you wanted to split the 500 in half, yet still do the process I described with four DIFFERENT lists, you could make the four filters be:
List one: Left(Symbol,1) <= "M"
List two: Left(Symbol,1) >= "N"
Left(Symbol,1) <= "F" OR
(Left(Symbol,1) >= "N" and Left(Symbol,1) <= "T")
(Left(Symbol,1) >= "G" and Left(Symbol,1) <= "M") OR
Left(Symbol,1) >= "N"
Left(Symbol,1) = "B" OR
Left(Symbol,1) = "D" OR
Left(Symbol,1) = "F" OR
Left(Symbol,1) = "H" OR
Left(Symbol,1) = "J" OR
Left(Symbol,1) = "L" OR
Left(Symbol,1) = "N" OR
Left(Symbol,1) = "P" OR
Left(Symbol,1) = "R" OR
Left(Symbol,1) = "T" OR
Left(Symbol,1) = "V" OR
Left(Symbol,1) = "X" OR
Left(Symbol,1) = "Z"
... a nearly infinite number of combo's are possible.
This filter would go into the FILTER BLOCK when testing.
Again ... if you haven't "heard of this method" before ... well, that might explain why so many "black box" soopa doopa systems are out there in the trading world, that don't seem to "pan out" in real life.
Location: L'ville, GA
And, the final installment re guidelines for testing, tweaking and evaluating your Strategy, in relation to:
Selection of Time Periods ... for backtesting vs forward-testing:
First, terminology: by "Time Periods" I'm not referring to daily vs weekly vs 5min or 30min bars, but rather the historical chunk of sequential bars that you use to run the tweaks and tests over, for your given Focus List and Strategy.
"Back-testing" refers to the tests done while you are tweaking and optimizing things. "Forward-testing" refers to proof-demo tests that USE the final adjustments. These should be DIFFERENT, NON-OVERLAPPING periods of time. It is not necessary that they be chronologically sequential, nor that the back period is chronologically prior to the forward period. The labels are conveniences. The point is that they are DIFFERENT data-sets of OHLCV bars.
Just the definitions alone should tell you the key point. The same kind of rationale used for separating the Focus Lists into subsets for tweaking vs proof-testing applies here.
Why, you might ask, is this necessary, if you are ALSO using different symbol-lists? Isn't that sufficient? Can't you use the SAME time-period for back and forward testing (tweaking vs proving), if the FL symbols are different?
NO. (well - yes you can, but it's not wise to)
Reason ... even though you are using different symbol-sets, the MARKET FORCES during that timeframe tend to PUSH ALL THE SYMBOLS in the same general way ... many guru's argue that 30-50% of the average symbol's price-action is due to the influence of the overall market at that time.
So ... you need to use DIFFERENT TIME-PERIODS to isolate one group from another, when doing the "tweaking" and "proving" periods.
But that raises another issue ... if you tweak during a "bull" market period, and prove using a "bear" market period, then a Strategy that does well in Bull markets might appear to be over-fitted ... that is, the results of the tweak run would be a lot better than those of the prove-run ... but it would NOT have been due to over-fitting ... rather, it would have to do with differing market conditions.
This is an important concern.
In fact, it should carry over into your Strategy design as well!
If your Strategy is focused mainly on Bull-market conditions, then you should HAVE A FILTER in place that only allows trades during those markets. For instance:
GetClose($DJI) > SMA(GetClose($DJI),100)
might be (very simplistic) a "bull market condition" filter.
If you use this filter, then you'll avoid having the tweak and prove runs mismatched.
There's another other nifty thing you might consider here ... AUTOMATIC selection of the time-periods to use for the testing and proving. The normal way is to specify particular periods for back and forward testing, via the Strategy or PS setups. Those inputs are not very versatile.
However, using a trick similar to the one I showed you for creating diverse Focus Lists, you can write OmniScript "time-period filters" for your tweak and prove runs ...
Let's say that you want the tweak tests to only utilize trades that start within the 2004-2006 calendar years. You can isolate these with a Filter:
BarYear() >= 2004 and BarYear() <= 2006
Of course, you can fill in whatever year(s) you want.
If you want to get really focused, you can build in BarMonth() rules as well. Or, if you're using intraday data to test 30min bars, for instance, you might need to define a range of BarDayOfMonth() rules.
If you're unclear on how this would be utilized, please CAREFULLY re-read the prior post.
Location: L'ville, GA
OK - that's enough for now!
I'm sorry that this is somewhat "technical" in nature - I realize that some of these concepts (&/or the solutions) will be foreign to newbie OT users ... or just plain difficult for anyone, who is not accustomed to think about this stuff.
My goal has been to present a "broad brush" set of ideas, to help people who are currently involved in system optimization and evaluation to obtain more reliable, effective results.
I'm also hoping that these ideas will help less-experienced users to gain an appreciation for the huge amount of work that goes on (hopefully) behind the scenes, when a really good Strategy is developed.
As I wrote those (fairly long) posts ... I was continually tempted to digress (even more) into other rabbit trails that have a bearing on the topic. MUCH MORE could be said about each of the three areas.
If you're scratching your head, and going "hmmm, I wonder ...?"
--- well, then, I've succeeded!
Location: Austin, TX 78731
Fantastic job Jim!!!! Thank you for composing this, as we will certainly be pointing other customers to this as a resource!!!!!!
Great thread Jim!!
Location: L'ville, GA
Here's something else to consider, to "balance out" the prior comments.
The "prior-history curve-fit boogeyman" WILL BE present, for ANY system or strategy that has been "worked on" at all to improve it. It's almost impossible to avoid some degree of "fit".
We need to step back a bit and realize that to some degree, curve-fitting is just part of the game. Sad, but true.
That is, even if you create an entirely unique new Strategy, with every component invented from scratch, and base it ONLY on your current knowledge about TA and trading logic, and NEVER do any StrategyWizard runs or CB/NN/GA (NClub tools) optimization ... and you have all OT-parameter optimization turned off ... EVEN IN THAT CASE, there will be a small degree of what could broadly be called "curve-fitting."
The reason that this is true is simply due to your extant "common-sense knowledge". For example, "everyone knows" that a MACD(12,26,9) or an RSI(14) or a BolBand(20,2) or an ADX(14) is a "pretty good" setup to work with, when using those indicators. Somewhere, sometime in the past, you or other folks established those "common-knowledge" rules based on observed performance of the indicators on actual charts and/or with actual trading accounts. THEREFORE, any **subsequent** use of those same indicators and param's will be carrying forward that earlier, albeit crude, "curve fit" exercise.
Of course this example is pushing it ... but I think it's healthy to recognize that whatever we do, we do for reasons that are based on some form of prior experience and knowledge. That "past history" does not necessarily repeat itself ... but our memory and thinking process does.
We can help avoid "FOOLING OURSELVES" (that's the true curve-fit "gremlin") by taking reasonable steps when testing and tweaking, and being very VERY careful to eval those tests and tweaks against truly untouched, truly different data.
Prior posts in this thread have provided some broad-brush recommendations as to how to minimize the "fooling myself" pitfalls.
[Edited by Jim Dean on 6/28/2010 10:22 AM]
THis is a top 5 thread of yours!! Altho over a month old, I just got to reading it! Nice work! And, many gracious thanks!!
E-Mail this thread to a friend
||Toggle e-mail notification|