Consider the impact of small samples on the optimization process. Small samples of market data are unlikely to be representative of the universe from which they are drawn: consequently, they will probably differ significantly from other samples obtained from the same universe. Applied to a small development sample, an optimizer will faithfully discover the best possible solution. The best solution for the development sample, however, may turn out to be a dreadful solution for the later sample on which genuine trades will be taken. Failure ensues, not because optimization has found a bad solution, but because it has found a good solution to the wrong problem!
Optimization on inadequate samples is also good at spawning solutions that represent only mathematical artifact. As the number of data points declines to the number of free (adjustable) parameters, most models (trading, regression, or otherwise) will attain a perfect tit to even random data. The principle involved is the same one responsible for the fact that a line, which is a two-parameter model, can always be drawn through any two distinct points, but cannot always be made to intersect three arbitrary points. In statistics, this is known as the degrees-of-freedom issue; there are as many degrees of freedom as there are data points beyond that which can be fitted perfectly for purely mathematical reasons. Even when there are enough data points to avoid a totally artifact-determined solution, some part of the model fitness obtained through optimization will be of an artifact-determined nature, a by-product of the process.
For multiple regression models, a formula is available that can be used to estimate how much “shrinkage” would occur in the multiple correlation coefficient (a measure of model fitness) if the artifact-determined component were removed. The shrinkage correction formula, which shows the relationship between the number of parameters (regression coefficients) being optimized, sample size, and decreased levels of apparent fitness (correlation) in tests on new samples, is shown below in FORTRAN-style notation:
In this equation, N represents the number of data points, P the number of model parameters, R the multiple correlation coefficient determined for the sample by the regression (optimization) procedure, and RC the shrinkage-corrected multiple correlation coefficient. The inverse formula, one that estimates the optimizationinflated correlation (R) given the true correlation (RfJ existing in the population from which the data were sampled, appears below
These formulas, although legitimate only for linear regression, are not bad for estimating how well a fully trained neural network model-which is nothing more than a particular kind of nonhnezu regression-will generalize. When working with neural networks, let P represent the total number of connection weights in the model. In addition, make sure that simple correlations are used when working with these formulas; if a neural network or regression package reports the squared multiple correlation, take the square root.
No comments:
Post a Comment