Current location | Thread information | |
![]() ![]() ![]() ![]() ![]() ![]() |
Last Activity 10/7/2016 12:00 PM 19 replies, 1415 viewings |
|
|
Printer friendly version |
^ Top | |||
kmcintyre![]() Elite ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 890 Joined: 10/11/2012 Location: Portland, OR ![]() |
I just completed a study in which I tried to capture a glimpse of how well PW performs. (A glimpse because, although the study took a while to perform, it really represents a fairly small number of PW runs.) I made 28 PW runs using settings of (100,1,5,4). The strategy universe was all the stock RTM strategies. All standard EFs were selected. Default account settings were used except for the simulation date range. The 28 runs each had a duration of 2 years. The start dates ran from 10/1/2012 - 10/28/2012. So what I was capturing was how single day shifts in the date range would effect PW results. I've attached two spreadsheets. They are the same except that one has a sheet capturing graphs of ending equity values, while the other captures graphs of frequency distribution for CAR, MAXDD, and Calmar for the 21 EFs. (I got carried away with pivot tables and my spreadsheet got to big to upload so I had to split it up...) I'm sure you guys will be able to draw your own conclusions (if you care), but here are some of my thoughts. 1) On the Summary sheet, I ranked the EFs based on CAR and Calmar. The rankings reflect how the EFs performed relative to each other over the 28 "competitions". This time EMA(C,7)/EMA(C,21) fared best. (Previously I posted a spreadsheet where LRS(5)-LRS(30) fared best, so no confirmation...) 2) The number of winning portfolios over a 2 year period vastly outnumbered the losing portfolios. You could hardly lose money with OV given a two year time horizon! 3) The Equity Graphs sheet shows the issue I'm concerned about. The results for all EFs display very wide swings in ending equity simply based on rebalance date. And it's not like all the graphs show peaks and valleys occurring at similar times. It really looks random to me. What would I expect? I would expect to see small variations based on start date, but since 728 of 730 calendar days were shared, the delta in return would be very muted. I.E. I would expect far more consistency from sample to sample. 4) The EF Graphs sheet shows the frequency distribution of CAR, MAXDD, and Calmar for each of the EFs over the 28 PW runs. I used 2.5% as the bin width for CAR and Calmar, and 1% for the MAXDD. (I started out picking custom bin widths for each graph, but decided there was value in sticking to the same bin values for all EFs. It shows the range of variation through the number of bars displayed on the graphs. More bars = more variation.) So what I was looking for on these graphs was central tendency. Did the EF provide reproducible results, or was it all just randomness? Was there a range of CAR values I could reasonably expect to see from an EF? Remember, these results are over 2 years of tradeing, and only vary because of the day the party began. What do I see? Mostly randomness. There are some graphs that show reproducibility, but for the most part I'm left with the feeling that an EFs performance metrics are effected as much by how lucky I am getting in on the "right" date. Maybe I spend two years and get poor results, or maybe I spend two years and get good results. Was it Monday of Thursday? 5) The EF Data sheet collects data from all the individual PW run result sheets and organizes the data by EF (versus date). I added min, max, mean, mode, and standard deviation stats for each EF. Again, these stats appear to reflect a lot of variability in PW performance IMO. I might be really happy or really disappointed after two years based on the day in October, 2012 I created my Dynamic Portfolio. That hardly seems like sound investing decision making. It feels more like playing the lottery. All that said, my expertise is not statistics. I've heard it said the "statistics is the art of creating unreliable facts from reliable data". So I could probably use some schooling on interpreting all the data I've collected. I am in the processing of collecting and analyzing data on custom RTM strategies and other simulation ranges. If I find something more hopeful I will share it. (I hope others will reciprocate.) Cheers Keith PS - spreadsheets are Excel 2010... ![]() ![]() | ||
^ Top | |||
Steve Mayo![]() Legend ![]() ![]() ![]() ![]() Posts: 414 Joined: 10/11/2012 Location: Austin, TX ![]() |
Inferential statistics require having a hypothesis, call it a theory, that you are trying to accept or reject. What's your hypothesis? You conclusion seems to be that PW is not useful because there is variability when you shift the starting/ending date. But, you give nothing against which to compare that variability. Here's a distribution graph looking at rolling quarterly returns for an OV portfolio (I forget now which one, and it really doesn't matter because most of them behave similarly) versus an EQUALLY-LEVERAGED market portfolio (i.e., both were at 200% margin and the same starting equity). It uses about 16 months of data ending in October. You can clearly see that both OV and SPY have high variability and OV is slightly higher risk on the downside, but the mean return for OV is twice that of the market; there is a clear rightward shift toward more and higher positive returns. You don't need a p-value here to see that your odds of making a profit (and more so, for not losing money) are going to be better with the OV portfolio This was a stock OV portfolio, so it's not a test of the value of PW. To do that experiment properly, you would need to compare non-PW portfolios and the market return to your results. Unfortunately, that's really hard to do. The experiment I did on the PW for OmniVesting.com did suggest that PW is better than no PW but again it takes a whole lot of more data when analyzing anything in the stock market due to that high inherent variability so a definitive answer just isn't possible. My point is just that variability is an inherent feature of the market. Saying that PW has variability too doesn't mean anything in that context. Is the variability higher or lower than a stock OV portfolio, or higher/lower than the overall market? And what's the nature of that variability? Generally, people care most about not losing money; they are willing to accept variability on the gain side if it reduces the frequency of trades on the loss side. Your data seems to support this as well. If you look at your rolling return summaries, very few of the intervals ended with a loss, and only the RSI failed to produce a positive result with 95% confidence - I'd already advised to avoid MDD, Calmar and CAR for technical reasons. Even not knowing what the overall market did in those timeframes, it still looks pretty impressive to me. [Edited by Steve Mayo on 11/14/2014 6:14 PM] ![]() ![]() | ||
^ Top | |||
kmcintyre![]() Elite ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 890 Joined: 10/11/2012 Location: Portland, OR ![]() |
Steve, I will collect the gain/loss stats for the SPY over 28 2 year periods, with starting dates 10/1/2012 - 10/28/2012. I will graph the results in the same way I did the ending equity for PW. (I will probably need to normalize the SPY to 100,000 so the graphs are apples to apples.) My guess is that I will not see wild variations in returns from day to day. My point is that the day-to-day performance I'm seeing when using PW and the available EFs is very inconsistent. I can't explain why 2 year returns are so wildly different based on a few days difference in start date. I also can't explain why EFs work well sometimes and not so well other times. Especially when the "times" are within days of each other. So my hypothesis is that PW results are based more on luck than reliable prediction. Is PW better that buy-and-hold of the SPY? I believe you when you say it is. But that's not what I've been focused on. I want consistent, reliable, and understandable results. All the data I've collected over months of testing display the same inability of PW to find the winning strategies on it's own AND produce consistent results, regardless of minor variations in account settings, including simulation date ranges. The variations defy logic IMO. My goal is not to tear down PW, but rather to point to areas where it needs improvement. That's what engineers do. I'll post my SPY comparison within a day or two. Keith | ||
^ Top | |||
kmcintyre![]() Elite ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 890 Joined: 10/11/2012 Location: Portland, OR ![]() |
Steve, Well, building that SPY test case was interesting. I'm attaching the spreadsheet. More variance in 2 year returns than I would have guessed based on the entry date. But the standard deviation of ending equity looks smaller than most of the PW results. So I still think PW is more volatile. But what puzzles me, (and I haven't taken a lot of time to dig into this), is that the return on buy and hold SPY was way higher than any of the PW EFs generated, based on mean ending equity. And I didn't even use compounding when estimating the SPY ending equity values. Further, the SPY test highlighted the fact that weekends don't generate equity values. The PW tests include runs for every date between 10/1/12 and 10/28/12. But I don't see flat spots in the ending equity curves I generated for all the EFs. So I need to figure that out too. If you have insight into why my SPY estimates (I know they are ballpark swags) seem to outpace all the PW runs I'd really appreciate your thoughts. (I threw the spreadsheet together pretty quickly. Maybe I've had another brain fart. It's been known to happen. Increasingly with age. :-) ) Cheers Keith ![]() | ||
^ Top | |||
Steve2![]() Elite ![]() ![]() ![]() ![]() ![]() Posts: 750 Joined: 10/11/2012 Location: Annapolis, MD ![]() |
Hi Keith, Just looked at a few months of the OV spreadsheet but I think the difference in returns between OV and SPY is because you assumed SPY average percent invested was 195% and the OV portfolios were always less than 100% invested. If you crank up OV allocations most portfolios would easily outperform SPY. Steve P.s. Will offer my 2 cents on the randomness later when I have more time. | ||
^ Top | |||
Steve Mayo![]() Legend ![]() ![]() ![]() ![]() Posts: 414 Joined: 10/11/2012 Location: Austin, TX ![]() |
Hi Keith, This is an interesting series. Thanks for starting it. Can you post download and post the actual daily mark-to-market values from your equity curve, say from the best-performing EF. I need at least 3, preferably 5-7 years to get a reasonable distribution graph. I can match it to SPY, deal with the holidays and weekends, deal with the differing leverage, etc. It will be a good test of the analysis tool I'm trying to create. I'll post back the results. Steve | ||
^ Top | |||
Steve2![]() Elite ![]() ![]() ![]() ![]() ![]() Posts: 750 Joined: 10/11/2012 Location: Annapolis, MD ![]() |
Keith, My thoughts on the seemingly random nature of OV simulations. The ending equity of any OV simulation is the result of a sequence of trades taken by the simulation. While it's tempting to think that seemingly small changes in a simulation's parameters (i.e., moving the start/end dates by one trading day) shouldn't have much effect on ending equity that's not the case because any change to parameters will cause a different sequence of trades to be generated/taken. With an account comprised of static portfolios, this randomness gets magnified because of account equity constraints. That is, it is frequently the case that you can't take all the trades that are generated on a given day because you don't have sufficient buying power. So even small changes to the sequence of generated trades can result in rather dramatic changes to the trades that are actually taken by the simulation. I spent many hours in the early days analyzing individual strategy performance within a portfolio and trying to do strategy replacement to improve performance. It never worked due to the random filtering introduced by buying power limitations. With dynamic portfolios, randomness is magnified once again because enabled strategies can change each period. So, seemingly small changes to account settings that impact EF calculations can result in different strategies being enabled which further impact on the trades generated/taken. So, I think the important thing to remember is that any simulation is just one data point. The higher the returns and the more consistent the returns are over the simulation period the better. But it's still just one data point and one needs to do sensitivity analyses on the simulation parameters including looking at rolling returns to really get a feel for how the account might perform. One of the things I've struggled with is how to verify that PW is working correctly. I'm not sure we have enough visibility into how things work to do this but it might be interesting, in your test, to look at one EF and see how the enabled strategies in the starting period changed over the first 28 days and whether that makes sense based on what the market was doing and the nature of the EF. Steve [Edited by Steve2 on 11/15/2014 11:45 AM] | ||
^ Top | |||
kmcintyre![]() Elite ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 890 Joined: 10/11/2012 Location: Portland, OR ![]() |
SteveM, The initial goal of my spreadsheet was to ascertain which EFs are "best", where "best" means more consistently generate higher CAR and/or Calmar values. Hence my attempt to rank EF performance over many competitions, then ranking the individual rankings. I did a similar test that ranked EFs based on 24 competitions, where each competition lasted 1 year, and the start dates shifted by 1 month. This was a follow-up. I added all the frequency distribution charts because I saw that EF performance varied significantly from day to day. And my results from this experiment failed to confirm my previous experiment. I added the equity curve graphs as icing because I thought more people could relate to ending equity. So when you ask for equity curves from my top 3 EFs over a 7 year period, I have to ask "based on which day of the week?". Because I haven't seen any consistency in EF performance. No EF has floated to the top as the top performer. EF performance seems more random that reliable IMO. I'm happy to collect data on all EFs based on a 7 year simulation period. But for sure (and to Steve2's post) a significant number of 7 year simulations will be required to determine the average performance of each EF. I could vary start date by a day, or a month. (I've done both.) Or weekly. Or 15 days. Or quarterly. Which would you prefer? I *KNOW* OV is capable of generating obscene returns. The key is to predict which strategy combinations to use with reasonable accuracy. Prediction is hard, but it is the key to OV success IMO. And I just don't see evidence that the current EFs are sufficiently accurate at prediction. Keith | ||
^ Top | |||
Steve Mayo![]() Legend ![]() ![]() ![]() ![]() Posts: 414 Joined: 10/11/2012 Location: Austin, TX ![]() |
Keith, what you are pointing out is the problem with looking at simulated equity curves. That pretty curve that OV draws is just one sample out of a whole bunch of possible outcomes. We all tend to look at those pretty curves and immediately think that it has more meaning than what it actually conveys. To make matters worse, we look at "statistics" on that pretty curve, such as CAR and MDD, and let those set our expectations for the future. They might be accurate, but only in the astronomical chance that the future is exactly like the past, and more particularly that the future is like THAT SPECIFIC slice of history that we simulated. Instead, we need to be looking at ALL the possible outcomes, and as much as is possible, apply some probabilities. The proper tool for that is distribution curves or box plots, but most people get glassy eyed even at the mention of such things. People don't understand probabilities, (look at the success of the lottery or the over-reaction to Ebola as examples). We all want a clean single number...and thus all technical analysis programs, be it OV, Tradestation or whatever, must give us, the customer, what they want to see, a pretty equity curve...so we can let our greed go wild. Going back to that example I provided above. The OV simulation had a CAR of 44% and an MDD of 19.5% which sounds pretty good, but if you take that historical data and look at all the possible outcomes (using bootstrapping and Monte Carlo analysis), you find that this particular OV portfolio has a mean of around 35% (geometric mean or "CAR") and a standard deviation (of this CAR) of around 30%. In other words, that 44% CAR shown by OV is 0.3 standard deviations above the mean, meaning that you really only have a 35% chance of getting better and a 65% chance of doing worse. To end on a high note, the mean on the equally-leveraged SPY buy&hold portfolio was about 23% with a standard deviation of about 22% so the OV portfolio has a slightly better volatility-adjusted return (Sharpe) and per the rolling distribution curve it's higher volatility seems to be on the positive side. SteveM Corrected to confirm that I was using the proper geometric (CAR) means in my analysis and to upload the graphs [Edited by Steve Mayo on 11/17/2014 2:16 PM] ![]() | ||
^ Top | |||
kmcintyre![]() Elite ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 890 Joined: 10/11/2012 Location: Portland, OR ![]() |
Steve, I follow what you're saying. I'm seeing WILR(14), over ^2 years^ of trading, generating CARs of 0.9% and 22.6% - based solely on which day in October the DP was generated. I'm seeing RSI(C,5) generate CARa between 0.0% to 16.2%. ROC(LNREG_SLOPE(5),10) between -1% and 21%. These are (IMO) wild swings. How can I pick an EF when their performance is so unpredictable and seemingly capricious? And how can I trust the CAR produced by an EF will hold up in the future? Will I end up with -1% or 21% (or something even more extreme)? I don't get a warm fuzzy from the data I've collected. It just feels wrong. The frequency distribution graphs seem to confirm my gut. YMMV Keith | ||
^ Top | |||
Steve2![]() Elite ![]() ![]() ![]() ![]() ![]() Posts: 750 Joined: 10/11/2012 Location: Annapolis, MD ![]() |
Keith, For comparison purposes, I conducted a similar test for one of my static portfolios. The test was conducted using 6 year simulations. As you can see, the results vary more than you would think. Maximum variation in ending equity was 16%, maximum daily variation in ending equity was 8% with an average daily variation of 3%. However, the variations did seem to run in cycles. I haven't correlated these to the market movement in the starting and ending months but I'll try and do that. I also want to look at the trade histories for consecutive days and see how they changed from day-to-day. You would expect that starting and ending trades would change but I wonder how far differences in starting trades ripple through the sim. I still think the key thing to quantify for dynamic portfolios is how much the initial enabled strategies change over the 28 days. Different strategies will obviously produce different results. If there is significant change then the question is does that make sense given how each of the EFs are calculated. Steve2 [Edited by Steve2 on 11/16/2014 10:39 AM] ![]() | ||
^ Top | |||
kmcintyre![]() Elite ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 890 Joined: 10/11/2012 Location: Portland, OR ![]() |
Steve2, Good information and points. (Steve Mayo too!) The better I understand the expected behaviors the easier it is to keep my head in adversity. And the more intelligently I can decide if I have the risk tolerance to play. There are three experiments I want to try - 1) Run another 28 PW runs covering 10/1/2010 - 10/28/2012. Then see if the EF performance during those runs is predictive of the results I've already captured for 10/1/-2012 - 10/28/2014. So "is past EF performance predictive of future EF performance". 2) Run 7+ year PW runs starting 1/1/2007 - 1/28/2007. Do I see similar frequency distribution curves for EF CAR, MAXDD, and Calmar? 3) Run 28 simulations using ARM4 Margin, with starting dates between 10/1/2012 and 10/28/2012 - 2 year duration. Compare the stats against the ones I collected on EFs and see how ARM4 Margin compares. That should only take a month or two... Keith | ||
^ Top | |||
gbarber![]() Veteran ![]() ![]() ![]() ![]() ![]() Posts: 282 Joined: 12/30/2012 Location: Pearland, TX ![]() |
I am not very good with statistics but I'll throw in 2 cents anyway. I am looking at the set of results shown by Steve2: 1. The set of results shown is really good and reasonably consistent. Look at the CAR and the MDD. They vary by a small amount and the CAR stays above 100% for all the runs. The MDD is higher than I could stand but that comes with the very high CAR (I hope we can all agree that a CAR of > 100% is very high). 2. Note that the high variances in ending equity are all on the plus side. I suspect no one would complain about a gain greater than expected. The variance on the negative side are relatively small. 3. Note that there are several groups of ending equity numbers in which the the final equity does not change. I'll bet there were no closed trades on those days. 4. Note that each run deletes a day on the start side and adds a day on the ending side. Thus each subsequent run excludes one day of trades that were included in the previous run and includes one day of trades that were not included in the previous run. Thus it seems to me that there is no doubt that the results would be different by some amount. The amount should correlate with the trades that were excluded and included. Has anyone looked at those to see if there is correlation. Also remember that the equity curve reflects mark to market numbers so it is the value of the account at the end from the trades made and including the open positions value. 5. Note that the further into the future the runs go, the smaller the variation in ending equity becomes. That seems to indicate OV is predicting future results fairly well. [Edited by gbarber on 11/16/2014 9:25 PM] | ||
^ Top | |||
Steve2![]() Elite ![]() ![]() ![]() ![]() ![]() Posts: 750 Joined: 10/11/2012 Location: Annapolis, MD ![]() |
Just to complete the above static portfolio analysis, I compared the trade histories for the simulations starting on 10/1 and 10/2. There were 7,852 positions in the 10/2 simulation. For each position on 10/2, I attempted to find a match in the 10/1 trade history. First I considered a match to be true if Strategy, Symbol, Open Date, Close Date matched. This showed that there were 417 positions (5.3% of total) that did not match. While a small number could be attributed to the one day shift in starting dates, the vast majority of differences were due to account equity constraints where there was enough buying power to allow additional trade(s) to be taken. These seem to occur throughout the simulation date range. Next I considered a match to be true if Strategy, Symbol, Open Date, Close Date, and Share Quantity matched. This showed that there were 5,127 positions (65.2% of total) that didn't match. This of course was due to the difference in account values over the life of the two simulations. As you can see, it is not surprising to see potentially significant differences in ending equity between two static portfolio simulations using the same account settings and only a one day difference in the simulation range. I suspect the difference between dynamic portfolio simulations can potentially be greater since there is also opportunity for differences in enabled strategies during the simulation periods. So, I guess the lesson here is that it is good to do a sensitivity analysis around simulation date ranges to assess variability. I would expect that the more consistent the returns are over a simulation range the lower the variability would be. Thanks Keith for bringing this up. Next, I'm going to turn my static portfolio into a dynamic one, starting with a high number of enabled strategies (e.g., 20 of 23) and repeat the experiment. It will be interesting to see if the variability increases due to differences in selected strategies each period. I'll report back... Steve2 | ||
^ Top | |||
Steve Mayo![]() Legend ![]() ![]() ![]() ![]() Posts: 414 Joined: 10/11/2012 Location: Austin, TX ![]() |
This is getting off topic, but I wanted to caution against some of the comments herein. This "shifting one day" analysis is called "Rolling Returns." It's a good way to see what the potential return might have been over any x-day period within a range, but it's not good for calculating mean and standard deviation. The reason is that it repeatedly uses the same central dates over and over each time. For example, if 15-Oct was an outlier, it will be in every rolling return you are calculating. The proper analysis method is to use bootstrapping with replacement in a Monte Carlo, which is what I was showing above. That sampling method creates a normal distribution; therefore the moments of that distribution (mean, standard deviation, skew, kurtosis, etc.) are accurate (at least within the confines of the historic data). Calculating those moments with samples that are not independent (e.g., the rolling returns) can be highly misleading. [Edited by Steve Mayo on 11/17/2014 2:09 PM] | ||
^ Top | |||
kmcintyre![]() Elite ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 890 Joined: 10/11/2012 Location: Portland, OR ![]() |
The frequency distribution charts show the range of CAR, MAXDD, and Calmar returned by each of the 21 default EFs. (I should have never included the ending equity graphs...) There is no central tendency in nearly all of those frequency distribution graphs. So StdDev is useless. Mean and median are still valid. I believe my point remains valid. From day to day, EF performance is nearly random. Even after 730 days and hundreds (thousands?) of xacts, most EFs exhibit little consistency in CAR, MAXDD, or Calmar, If you get good results over a 2 year period, using a specific EF, starting on 10/5. you can't expect similar performance starting on 10/6. Hence all PW results have to be taken with a BIG grain of salt. I'm collecting data on 7+ year PW runs as I write this. Perhaps I'll start seeing consistency with a longer time horizon. Keith | ||
^ Top | |||
Mark Holstius![]() Elite ![]() ![]() ![]() ![]() Posts: 744 Joined: 10/11/2012 Location: Sleepy Hollow, IL ![]() |
Good morning Keith... Would you be so kind as to take a snapshot of the account settings you're using for these tests? You referred to them as "default"... And thanks for all your testing, Mark [Edited by Mark Holstius on 11/19/2014 9:09 AM] | ||
^ Top | |||
kmcintyre![]() Elite ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 890 Joined: 10/11/2012 Location: Portland, OR ![]() |
Mark, I created a new account and used the settings OV defaults to. I've attached a png. These are not settings I would use. I was simply trying to keep it simple, focus on EFs, and go with the most basic defaults/strategies possible. Keith ![]() | ||
^ Top | |||
Mark Holstius![]() Elite ![]() ![]() ![]() ![]() Posts: 744 Joined: 10/11/2012 Location: Sleepy Hollow, IL ![]() |
Hi Keith.... Steve & I have debated about and studied the Account Settings a lot because they have a huge effect of the results - and written quite a bit about that in the forum. I started doing all my testing with my chosen settings (vs Nirvana's) so I know what to expect and compare results consistently, but that's a personal choice. The settings are obviously very 'personal' and need to fit your trading style, but here are some observations about the settings that might be skewing the results you're getting (the "I like(s)" are just personal preferences from my testing); Best of luck - thanks again for your insights, Mark [Edited by Mark Holstius on 11/19/2014 7:57 PM] ![]() | ||
^ Top | |||
kmcintyre![]() Elite ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 890 Joined: 10/11/2012 Location: Portland, OR ![]() |
Mark, The default settings are not what I have or would trade. But I found myself creating so many test accounts, and sometimes forgetting to import settings from my other "template" accounts, that I figured if I wasn't focused on something I was going to trade, or experimenting with the effects of account settings, I would just go with the defaults. KISS. I need all the help I can get! :-) Keith |
|
|
Legend | Action | Notification | |||
Administrator
Forum Moderator |
Registered User
Unregistered User |
![]() (un)/Freeze thread | |
Toggle e-mail notification |