Optimal Data Split for Model Validation

The decision to incorporate cross-validation into validation processes of mathematical models raises an immediate question – how should one partition the data into calibration and validation sets? We answer this question systematically: we present an algorithm to find the optimal partition of the data subject to certain constraints. While doing this, we address two critical issues: 1) that the model be evaluated with respect to predictions of a given quantity of interest and its ability to reproduce the data, and 2) that the model be highly challenged by the validation set, assuming it is properly informed by the calibration set. This framework also relies on the interaction between the experimentalist and/or modeler, who understand the physical system and the limitations of the model; the decision-maker, who understands and can quantify the cost of model failure; and the computational scientists, who strive to determine if the model satisfies both the modeler’s and decision maker’s requirements.

  • R. E. Morrison, C. M. Bryant, G. Terejanu, S. Prudhomme, and K. Miki, “Data partition methodology for validation of predictive models,” Computers & Mathematics with Applications, vol. 66, iss. 10, p. 2114–2125, 2013.
    [BibTeX]
    @article{MorrisonJ_CMA_2012,
    pages = {2114--2125},
    number = {10},
    volume = {66},
    year = {2013},
    journal = {{Computers \& Mathematics with Applications}},
    title = {{Data partition methodology for validation of predictive models}},
    author= {Rebecca E. Morrison and Corey M. Bryant and Gabriel Terejanu and Serge Prudhomme and Kenji Miki},
    }

  • G. Terejanu, “Predictive Validation of Dispersion Models Using a Data Partitioning Methodology,” in IMAC-XXXII Conference and Exposition on Structural Dynamics, Orlando, Florida, 2015, pp. 151-156.
    [BibTeX]
    @CONFERENCE{TerejanuP_IMAC_2015,
    author = {Gabriel Terejanu},
    title = {{Predictive Validation of Dispersion Models Using a Data Partitioning Methodology}},
    booktitle = {{IMAC-XXXII Conference and Exposition on Structural Dynamics, Orlando, Florida}},
    year = {2015},
    pages= {151-156},
    month= {February},
    }

  • R. Morrison, C. Bryant, G. Terejanu, K. Miki, and S. Prudhomme, “Optimal Data Split Methodology for Model Validation,” in Proceedings of the World Congress on Engineering and Computer Science 2011 Vol II, WCECS 2011, 2011, pp. 1038-1043.
    [BibTeX]
    @inproceedings{MorrisonP_WCECS_2011,
    author = {Rebecca Morrison and Corey Bryant and Gabriel Terejanu and Kenji Miki and Serge Prudhomme},
    title = {{Optimal Data Split Methodology for Model Validation}},
    booktitle = {{Proceedings of the World Congress on Engineering and Computer Science 2011 Vol II, WCECS 2011}},
    location = {San Francisco, USA},
    year = {2011},
    month = {October 19-21},
    pages = {1038-1043},
    ISSN = {2078-0958},
    }