OmniTrader Forum : OmniTrader 2012 Technical Support : Deleting Thread Tradescope Bins & Segments

Jim Dean

Sage

Posts: 3022

Joined: 9/21/2006
Location: L'ville, GA

User Profile

Subject : RE: Tradescope Bins & Segments
Posted : 2/5/2012 12:58 PM
Post #25789 - In reply to #25788

Hi, James:

First, a disclaimer ... I'm not privy to any internal documentation that you don't have. So, these comments are "educated guesses" ...

"Binning" is a very common way of dealing with disparate datapoints, to try to make sense out of them. From other Nirvana-responses on the forum, it appears that TradeScope uses the most simple form of binning, rather than a true statistical analysis based on things like root mean square and variance and other somewhat-mystifying terms.

It might help to think of TS bins as "stretchy buckets". Your settings-inputs, directly or indirectly, determine how many buckets there are, and how big they are. I'll return to that in a moment - for now, just think of a row of buckets, side by side, with numbers marked on them.

TS uses these buckets to hold "scores" (grades) of events that have occured in the past, to help you evaluate the likeliness of a similar event producing a positive or negative or neutral score in the future. The nature of the "event" is related to the "boolean variable" that you define, which as you say is either true or false.

What that boolean variable does is define the "state" of the price / volume at a point in time, relative to whatever the rule has in it. That state is either True or False. There are two "events" that we can glean from the rule ... WHEN it switches from T to F, and when it switches from F to T.

TS (as I understand it) uses those events to trigger the beginning of a measurement - or rather, when it flips from bull to bear. From that point forward, the price rises or falls. TS measures HOW MUCH it rises or falls, for each instance of one of those historical events, and stores that information in a normalized form of some sort ... either a ratio vs price or vs ATR or something else. So, what is "remembered" for each of those T-to-F and F-to-T events is how much price moved on a relative basis after that.

Furthermore, TS can adjust its "price-move capture-calibration" to a particular window of time, presumably suited to your trading style. It can look for the net price movement over N-bars since the "boolean event" occured that I just described. So, the "remembered" relative-percent-price-move-since-the-boolean-flip is can be relative to a particular time window.

OK - using this method of measurement, TS looks over the past history of each symbol in the focus in the defined backtest period, and collects a huge pile of percent-price-moves. Picture a big pile of index cards, each with a normalized price-change number written on it.

NOW we get to the binning thing. That big pile is not useful to us ... so TS sorts thru the pile ahead of time, to let us know how the numbers on the index-cards are "distributed" (from very positive to very negative price change).

When you input Equal Size, TS makes the RANGE of price-change represented by each bucket be the same. For example, if the most-positive price change is 200% and the most-negative price change is -100%, and if 30 buckets are used, then each bucket is "assigned" to a price-change range of ten percentage points = [200 - (-100)] / 30 = 300/30 = 10.

With that particular collection of buckets (all same size re price-change), TS then starts going thru the big pile of index cards and flipping each of them into whatever bucket defines the range that includes the number that's written on that card.

When it gets done with that process, it COUNTS HOW MANY CARDS (events with a given normalized-price-change) are in each bucket. In theory a bucket could have anywhere from 0% of the total number of cards, to 100% of the cards - of course normally there will be a spread of results.

That spread is used to paint the TS thermometer on the screen. It's not clear to me (sometimes) why the boundaries marked on the screen are what they are ... but their location and the percentage associated with them are based on the counts of index cards in the various buckets.

NOW, let's think about the EQUAL SAMPLE case. In the Equal SIZE case, we looked at the total numeric RANGE of normalized-price-change for the whole pile of cards, to determine the "range-subset-label" of each bucket. But for EQUAL SAMPLES, we start off by COUNTING THE CARDS, regardless of the numbers written on them. Let's say we have 500 instances of boolean-event-change that occured in the historical study. If we use 25 bins, then the Equal Sample method tries its best to "size" the buckets so that there are TWENTY CARDS in EACH bucket ... ie equal samples.

How does it do this? It takes the big pile of index-cards and SORTS THEM out ... puts them in a long row with the most-negative-percent-price-change on the left, and the most-positive-percent-price-change on the right of the row.

Then, TS "walks along the row", picking up cards as it goes. It puts the first twenty cards in the first of the 25 buckets, and the next twenty cards in the second bucket, and so forth. Each time it gets to a boundary, it takes a look at the VALUE WRITTEN on the last card for one bucket and the first card for the next bucket, and averages that value. That becomes the "range-label" for that bucket.

Sometimes, using this method, you get a very predictable set of buckets ... sort of a "bell curve" (or a version of the bell curve) ... where the outermost high/low buckets cover a very wide range of price-changes, and the centermost buckets cover a much tighter range of changes. That occurs when there are fewer cards with hugely-positive or hugely-negative changes ... which is "statistically typical".

However, there can be situations where a given boolean-state-change definition defines a trigger that has LOTS of big-positive and/or big-negative price-changes, with relatively few in the middle. Those are the "true-gold" combos ... since they identify excellent trading opportunities. Of course the trick is to figure out the right boolean-rules ;~)

Read through the explanation above about how TS examines the buckets when they are full, and paints the thermometer on the screen.

Finally, we have CUSTOM BOUNDARIES. I hope by now you can guess how this works. In this situation, TS does not look at the NUMBER of index cards (as it did for Equal Sample), nor at the RANGE of values written on the cards (as it did for Equal Size) ... it just looks at WHAT YOU TOLD IT. You get to arbitrarily define the normalized-price-percent-change boundaries for each bucket. Then TS goes thru the pile of index cards and tosses them into your special buckets ... and paints the thermometer as described before.

Whew! Long explanation ... I hope it helped. And, once again, please note that I could be wrong in some parts of this description. I'm simply drawing on an understanding well-established conventional "binning" (ie "bucket-ting" or "segment-ting") methodologies, and trying to connect-the-dots to what TS is doing. If I'm incorrect, I hope that Nirvana will clarify.

FINAL POINT ... if you choose Global vs Symbol-by-Symbol, then the most important difference is that you get a LARGER PILE OF INDEX CARDS to work with. That's very important in statistical analysis. The bigger the pile, the more reliable the results. If Nirvana had chosen to use true statistical methods for this, rather than the "arithmetic binning", then they could have output a "degree of confidence" for each of the percentage-up and percentage-down values.

I think they made the right choice in keeping it simple. Firstly, it makes it easy to explain (see above). Secondly, it does not produce confusing "confidence" results for SMALL piles of index cards. Some users will chose to use short backtest periods, and/or very complex boolean rules which won't produce a meaningful sampling of events. In those cases, the confidence-values might be misleading.

So, should you always use Global? In addition to it being more statistically robust, it also provides (in theory at least) somewhat faster processing time for the "training" of the TS bucket-brigade. ON THE OTHER HAND, if the symbols in the focus list you are using have individually-consistent but WIDELY different personalities from one another, then a that big pile of index cards has apples and oranges and banannas and strawberries all mixed up. The reliability of the predictions would in that case drop off.

Nutshell recommendation:
1. if you have a disparate group of symbols, and plenty of historical events to draw from (re backtest period and boolean spec's), use symbol-by-symbol
2. if your symbols are related ... ie part of the same Industry Group or pre-filtered by Group Trader to find ones that are often "in sync", then use Global.
3. if you can't use much history (for some reason), then use Global but ALSO break your focus list up into sub-groups of stocks with similar personalities.

JDref-711#3410

[Edited by Jim Dean on 2/5/2012 1:01 PM]