Sum of Ranking Differences – An innovative statistical comparison method

(Preview to a project in the M.Sc. program „Computer Science”, Kempten University, winter semester 2020/21)

Lecturer in charge at Kempten University: Prof. Dr. Jochen Staudacher

Content

Sum of Ranking Differences (SRD) is a novel statistical method that ranks competing solutions based on a reference point. The latter might arise naturally, or can be aggregated from the data. The method originates from chemistry where various properties of a substance, measured in different laboratories (or by various methods) has to be compared. For instance, the big East/West EU product quality scandal in 2017 – better known as the #Nutellagate – was resolved by analysing product samples from both sides of Europe. SRD makes such comparisons simple. As a result, it is rapidly gaining popularity in various fields of applied science, such as analytical chemistry pharmacology, decision making and social choice.

The input of an SRD analysis is an matrix, where the first columns represent the different solutions (measurement techniques, Pareto-optimal outcomes), while the rows represent the measured variables (properties). The last column of the matrix has a special role. It contains the benchmark values, called references, which form the basis of comparison. From the input matrix we compose a ranking matrix by replacing each value in a column – in order of magnitude – by its rank. Then SRD values are obtained by computing the absolute differences between the column ranks and the reference ranking and summing them up. SRD is not solely a distance metric, but a composite procedure including data fusion and validation steps. Validation includes two procedures: The permutation test (also called randomization test) shows whether the rankings are comparable with a ranking taken at random. The second validation option is called cross-validation, and assigns uncertainties to the SRD values. Leave-one-out cross-validation is applied if the number of rows is less than 14. Leave-many-out cross-validation is applied for larger number of rows in the input matrix.

Students’ task will include the implementation of

  • SRD;

  • Data preprocessing techniques;

  • Validation procedures;

  • Various plots related to SRD;

  • Heatmap based on the pairwise distance of the solutions;

  • (if time allows) Variants of SRD.

The project is jointly supervised by Prof. Dr. Jochen Staudacher, lecturer in charge at Kempten University, Dr. Balázs Sziklai from Corvinus University of Budapest, Dr. Attila Gere from the Hungarian Academy of Sciences, and Prof. Dr. Károly Héberger from Research Centre for Natural Sciences, ELKH, Hungary. The software (which will be written in the R programming language) will be distributed under general public licence, GPL-3.

Literature

[1] Héberger K. Sum of ranking differences compares methodsor models fairly. TrAC Trends in Analytical Chemistry. 2010; 29(1):101—109. https://doi.org/10.1016/j.trac.2009.09.009

[2] Klára Kollár-Hunek and Károly Héberger, Method and Model Comparison by Sum of Ranking differences in Cases of Repeated Observations (Ties) Chemometrics and Intelligent Laboratory Systems, 127 /-/ 139-146 (2013) http://dx.doi.org/10.1016/j.chemolab.2013.06.007 

[3] Sziklai BR, Héberger K (2020) Apportionment and districting by Sum of Ranking Differences. PLOS ONE 15(3): e0229209. https://doi.org/10.1371/journal.pone.0229209