Asymptotic algorithm for computing the sample variance of interval data

Document Type: Research Paper


Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland


The problem of the sample variance computation for epistemic inter\-val-valued data is, in general, NP-hard. Therefore, known efficient algorithms for computing variance require strong restrictions on admissible intervals like the no-subset property or heavy limitations on the number of possible intersections between intervals. A new asymptotic algorithm for computing the upper bound of the sample variance in a feasible time is proposed. Conditions required for its application with finite samples are discussed and some properties of the algorithm are also given. It appears that our new algorithm could be effectively applied in definitely more situations than methods used so far.


[1] C. Angulo, D. Anguita, L. Gonzalez-Abril, J. A. Ortega, Support vector machines for interval discriminant analysis, Neuro-
computing, 71 (2008), 1220-1229.
[2] J. Antoch, R. Miele, Use of genetic algorithms when computing variance of interval data, In: B. Fichet et al. (Eds.), Clas-
sifi cation and Multivariate Analysis for Complex Data Structures, Studies in Classifi cation, Data Analysis, and Knowledge
Organization, Springer, 2011.
[3] A. Blanco-Fernandez, A. Colubi, M. Garca-Barzana, A set arithmetic-based linear regression model for modelling interval-
valued responses through real-valued variables, Information Sciiences, 247 (2013), 109-122.
[4] C. Cappelli, P. D'Urso, F. Di Iorio, Regime change analysis of interval-valued time series with an application to PM10,
Chemometrics and Intelligent Laboratory Systems, 146 (2015), 337-346.
[5] I. Couso, D. Dubois, Statistical reasoning with set-valued information: Ontic vs. epistemic views, International Journal of
Approximate Reasoning, 55 (2014), 1502-1518.
[6] E. Dantsin, V. Kreinovich, A. Wolpert, G. Xiang, Population variance under interval uncertainty: a new algorithm, Reliable
Computing, 12 (2006), 273-280.
[7] P. D'Urso, L. De Giovanni, R. Massari, Trimmed fuzzy clustering for interval-valued data, Advances Data Analysis and
Classi fication, 9 (2015), 21-40.
[8] P. D'Urso, P. Giordani, A least squares approach to principal component analysis for interval valued data, Chemometrics and
Intelligent Laboratory Systems, 70 (2004), 179-192.
[9] A. P. Duarte Silva, P. Brito, Discriminant analysis of interval data: An assessment of parametric and distance-based ap-
proaches, Journal of Classi fication, 32 (2015), 516-541.
[10] S. Ferson, L. Ginzburg, V. Kreinovich, L. Longpre, M. Aviles, Exact bounds on nite populations of interval data, Reliable
Computing, 11 (2005), 207-233.
[11] M. Gagolewski, Spread measures and their relation to aggregation functions, European Journal of Operational Research, 241
(2015), 469-477.
[12] A. Jalal-Kamali, V. Kreinovich, Estimating correlation under interval uncertainty, Mechanical Systems and Signal Process-
ing, 37 (2013), 43-53.
[13] A. Ko lacz, P. Grzegorzewski, Measures of dispersion for multidimensional data, European Journal of Operational Research,
251 (2016), 930-937.
[14] V. Kreinovich, S. Ferson, Computing best-possible bounds for the distribution of a sum of several variables is NP-hard,
International Journal of Approximate Reasoning, 41 (2006), 331-342.
[15] V. Kreinovich, H. T. Nguyen, B. Wu, On-line algorithms for computing mean and variance of interval data, and their use in intelligent systems, Information Sciences, 177 (2007), 3228-3238.
[16] V. Kreinovich, G. Xiang, Fast algorithms for computing statistics under interval uncertainty: An overview, In: V. N. Huynh et al. (Eds.), Interval/Probabilistic Uncertainty and Non-Classical Logics, Springer, 2008, 19-31.

[17] V. Kreinovich, G. Xiang, S. Ferson, Computing mean and variance under DempsterShafer uncertainty: Towards faster
algorithms, International Journal of Approximate Reasoning, 42 (2006), 212-227.
[18] R. E. Moore, Interval Analysis, Prentice-Hall, 1966.
[19] R. E. Moore, R. B. Kearfott, M. J. Cloud, Introduction to Interval Analysis, SIAM, 2009.
[20] H. T. Nguyen, V. Kreinovich, B. Wu, G. Xiang, Computing Statistics under Interval and Fuzzy Uncertainty, Springer, 2012.
[21] A. Oussous, F. Z. Benjelloun, A. A. Lahcen, S. Belfkih, Big Data technologies: A survey, Journal of King Saud University-
Computer and Information Sciences, 30 (2018), 431-448.
[22] A. B. Ramos-Guajardo, A. Colubi, G. Gonzalez-Rodrguez., Inclusion degree tests for the Aumann expectation of a random
interval, Information Sciences, 288 (2014), 412-422.
[23] A. B. Ramos-Guajardo, P. Grzegorzewski, Distance-based linear discriminant analysis for interval-valued data, Information
Sciences, 372 (2016), 591-60.
[24] B. Sinova, A. Colubi, M. A. Gil, G. Gonzalez-Rodrguez, Interval arithmetic-based linear regression between interval data:
Discussion and sensitivity analysis on the choice of the metric, Informtion Sciences, 199 (2012), 109-124.
[25] A. Skowron, A. Jankowski, S. Dutta, Interactive granular computing, Granular Computing, 1 (2016), 95-113.
[26] R. M. C. R. Souza, D. C. F. Queiroz, F. J. A. Cysneiros, Logistic regression-based pattern classi ers for symbolic interval
data, Pattern Analysis and Applications, 14 (2011), 273-282.
[27] T. Sunaga, Theory of interval algebra and its application to numerical analysis, RAAG Memoirs, Ggujutsu Bunken Fukuy-
kai, Tokyo, 2 (1958), 29-46, 547-564.
[28] S. A. Vavasis, Nonlinear Optimization: Complexity Issues, Oxford University Press, New York, 1991.
[29] H. Wang, Z. Xu, H. Fujita, S. Liu, Towards felicitous decision making: An overview on challenges and trends of Big Data, Information Sciences, 367-368 (2016), 747-765.
[30] M. Warmus, Calculus of approximations, Bulletin de l'Academie Polonaise de Sciences, 4 (1956), 253-257.
[31] G. Xiang, M. Ceberio, V. Kreinovich, Computing population variance and entropy under interval uncertainty: linear-time algorithms, Reliable Computing, 13 (2007), 467-488.