Search This Blog

Tuesday, 18 November 2014

Large, Representative Samples

As suggested earlier, failure is often a consequence of presenting an optimizer with the wrong problem to solve. Conversely, success is likely when the optimizer is optimized on data from the near future, the data that will actually be traded; do that and watch the profits roll in. The catch is where to find tomorrow’s data today. Since the future has not yet happened, it is impossible to present the optimizer with precisely the problem that needs to be solved. Consequently, it is necessary to attempt the next-best alternative: to present the optimizer with a broader problem, the solution to which should be as applicable as possible to the actual,
but impossible-to-solve, problem. One way to accomplish this is with a data sample that, even though not drawn from the future, embodies many characteristics that might appear in future samples. Such a data sample should include bull and bear markets, trending and nontrending periods, and even crashes. In addition, the data in the sample should be as recent as possible so that it will reflect current patterns of market behavior. This is what is meant by a representative sample. As well as representative, the sample should be large. Large samples make it harder for optimizers to uncover spurious or artifact-determined solutions. hrinkage, the expected decline in performance on unoptimized data, is reduced when large samples are employed in the optimization process.

Sometimes, however, a trade-off must be made between the sample’s size and the extent to which it is representative. As one goes farther back in history to bolster a sample, the data may become less representative of current market conditions. In some instances, there is a clear transition point beyond which the data become much less representative: For example, the S&P 500 futures began trading in 1983, effecting a structural change in the general market. Trade-offs become much less of an issue when working with intraday data on short time frames, where tens of thousands or even hundreds of thousands of bars of data can be gathered without going back beyond the recent past.

Finally, when running simulations and optimizations, pay attention to the number of trades a system takes. Like large data samples, it is highly desirable that simulations and tests involve numerous trades. Chance or artifact can easily be responsible for any profits produced by a system that takes only a few trades, regardless of the number of data points used in the test!

No comments:

Post a Comment