Page 172 - Demo
P. 172
1st Int. Transborder Conf. of the Timor Island: Timor %u2013 Science without borderDili, 7-8 May 2025171On Proportions As the Optimum Point of the Weighted Simpson IndexJos%u00e9 Pinto CasquilhoPrograma de P%u00f3s-Gradua%u00e7%u00e3o e Pesquisa, Universidade Nacional Timor Lorosa%u2019e, Timor-LesteEmail: josecasquilho7@gmail.comAbstractProportions %u2013 or, equivalently, percentages %u2013 are used in all domains of knowledge and applications. In short, proportions are a measure of the existence of the parts in a composition, denoting their relative presence: positive numbers that add up to the unity (or 100%, using percentages). Yet, one may ask: with such a universal horizon, can we say that proportions are the optimal solution of any known mathematical framework? The answer is affirmative, at least in one case: proportions are the optimum point of the weighted Simpson index when one uses reciprocal weights. Simpson index was published in Nature by 1949, stated as measuring the concentration of a classification defined with n classes. The formula is the sum of the squared probabilities %ud835%udc46 = %u2211 %ud835%udc5d%ud835%udc56%ud835%udc5b 2%ud835%udc56=1thus reckoning the probability that two randomly selected individuals, with reposition, belong to the same class. The maximum concentration occurs when all the presences are confined to a single class (a vertex of the simplex), and the index values 1. Theminimum value of the index, corresponding to the lowest concentration of the classification, occurs with the uniform distribution where all classes have the same probability (or relative frequency) 1/n, and the index attains that same value: min %ud835%udc46 = 1%u2044%ud835%udc5b. Long before Edward H. Simpson has published his paper entitled %u201cMeasurement of diversity%u201d, the formula of the index was already in use since the 1930%u2019s in classified documents concerning cryptanalysis and named the %u2018probability of monographic coincidence%u2019 by Solomon Kullback and William Friedman. Also, Simpson index %ud835%udc46 is an inverse measure of diversity, and a direct measure can be built like %ud835%udc37 = 1 %u2212 %ud835%udc46, which is commonly known as the Gini-Simpson index, already used by the Italian statistician Corrado Gini in 1912. The weighted Simpson index is a weighted version of the Simpson index %ud835%udc46 obtained by incorporating positive weights for each portion of the sum, as if there were driving forces associated with the different combinations of pairs. Thus, with %ud835%udc64%ud835%udc56 > 0for %ud835%udc56 = 1, %u2026 , %ud835%udc5b one has %ud835%udc46%ud835%udc64 = %u2211 %ud835%udc64%ud835%udc56%ud835%udc5d%ud835%udc56%ud835%udc5b 2%ud835%udc56=1. The weighted Simpson index seems to have been firstly used by Nowak and May in 1992 as a Lyapunov function, relative to assess the stability of equilibrium point(s) concerning the dynamics of competitive virus strains in the context of HIV infections. The study of the optimal point of %ud835%udc46%ud835%udc64 %u2013 a minimum point of a differentiable convex function %u2013 was first published in 2024 in Mathematics%u00ae journal. In the paper %u201cOn the optimal point of the weighted Simpson index%u201d we can see that in equation (3) we have the optimal coordinates defined like %ud835%udc5d%ud835%udc57%u2217 = 1 (%ud835%udc64%ud835%udc57 %u22111%ud835%udc64%ud835%udc56%ud835%udc5b%ud835%udc56=1 %u2044 ) for %ud835%udc57 = 1, %u2026 , %ud835%udc5b. Also, if we define a random variable %ud835%udc4a with values corresponding to the weights {%ud835%udc64%ud835%udc56}%ud835%udc56=1,%u2026,%ud835%udc5b such that the law of probability is Pr[%ud835%udc4a = %ud835%udc64%ud835%udc57] = %ud835%udc5d%ud835%udc57%u2217, then the expected value of %ud835%udc4a becomes %ud835%udc38[%ud835%udc4a] = %u2211 %ud835%udc64%ud835%udc56%ud835%udc5d%ud835%udc56%ud835%udc5b %u2217%ud835%udc56=1and the result is the harmonic mean of the weights, meaning %ud835%udc38[%ud835%udc4a] = H(%ud835%udc64). In this presentation, it will be shown that using reciprocal weights (%ud835%udc63%ud835%udc56 = 1%u2044%ud835%udc64%ud835%udc56) one gets that the optimal coordinates are the proportions concerning the sum of reciprocal weights. More specifically: we can rewrite the optimal coordinate as %ud835%udc5d%ud835%udc57%u2217 = (1%u2044%ud835%udc64%ud835%udc57) (%u22111%ud835%udc64%ud835%udc56%ud835%udc5b%ud835%udc56=1 %u2044 ) then, building the reciprocals of the original weights one gets %ud835%udc63%ud835%udc56 = 1%u2044%ud835%udc64%ud835%udc56 and obtain %ud835%udc5d%ud835%udc57%u2217 = %ud835%udc63%ud835%udc57(%u2211 %ud835%udc63%ud835%udc56%ud835%udc5b%ud835%udc56=1%u2044 ), which is the proportion of the reciprocal weight %ud835%udc63%ud835%udc57in the sum of all reciprocal weights %u2211 %ud835%udc63%ud835%udc56%ud835%udc5b%ud835%udc56=1. And so, in this context, proportions can be seen as the optimal point of the weighted Simpson index with a (suitable) conversion of the original weights.