Review: Kernel classification of connectomes based on earth mover's distance between graph spectra

Summary of reviews

Sofia Ira Ktena: 1 (2) View Review 1

Anonymous Reviewer: 1 (2) View Review 2

Rebuttal

Review 1 (Sofia Ira Ktena)

Reviewer's confidence: 2 (Knowledgeable)

Overall recommendation: 1 (Probably Accept)

SUMMARY
This paper presents a new method for performing classification analysis on structural brain networks. The paper is well-structured and gives a neat introduction to the method. Also, a thorough evaluation of the method is presented using a disease classification task.

STRENGTHS
The paper presents a novel method for performing classification on structural networks by combining already existing methodologies from pattern recognition and machine learning. The steps of the classification pipeline are presented in a clear manner and the relevant definitions are given. The authors mention comparing to alternative kernel-based approaches and show the comparison of the classification results to other state of the art methods (e.g. graph embedding). Additionally, the authors explore different choices for defining the edge weights of the networks. They also evaluate the performance of the method in a cross-validation setting, which is a reasonable choice that provides a good trade-off between bias and variance. They also seem to test the suitability of different regularisation parameters for the SVM classifier, although they do not report these results. Finally, most of the results are nicely illustrated with figures.

SHORTCOMINGS
The contributions of the paper are not explicitly stated (e.g. novelty, computational complexity, classification accuracy). Also, alternative choices of graph kernels for brain network classification are not mentioned, although several kernel methods have recently been proposed in neuroimaging studies. Perhaps a more detailed literature review could have been useful. Additionally, the concept of the figures is not always very clear. Another point is that, although previous comparison to probabilistic distance measures is mentioned, the results are not illustrated in the paper. The same holds for the different regularisation parameters for the SVM classifier, for which results are not compared or discussed in detail. The limitations of the method are not discussed, neither its applicability on different kinds of networks (e.g. networks with different numbers of nodes, different parcellations or functional networks).

CONSTRUCTIVE FEEDBACK
The language of the paper could be improved in the abstract and introduction. Also, figures would be more informative if they were coloured rather than greyscale. In the introduction, the part that refers to network construction and how the weights are proportional to the streamline count should highlight that this applies to structural networks only. It would also be nice to remain consistent with variable names across the paper, making sure that the number of nodes (n) and embedded vector length (n*(n-1)/2 for a undirected network) are always referred to with the same name. It would also be nice to briefly mention the benefits of the methodology at the end of the introduction. In sections 2 and 3 the Background and Related Work are a bit mingled and the part where you state that 'The spectra of each of these matrices can be interpreted in their own way' is not very clear. Figure 1 is a bit vague too, perhaps a better explanation in the caption would be helpful. In formula (5) it is not clear what f_kl represents. Additionally, it would be nice to add references to the FACT algorithm (section 4) and scikit-learn (section 5.2). The title of section 5.1 is a bit misleading, since you mostly explain 'Network Construction' or 'Definition of Network Edge Weights'. In section 6.2 you mention that using both streamline count and physical distances improves classification accuracy, but you do not explain why - could it be because of a bias of the tractography method against longer streamlines? Furthermore, in the same section you mention that EMD-based kernel is better than probabilistic kernels but in section 3.2 you mention achieving 0.8 ROC AUC before (whereas the best reported ROC AUC is ~0.71 for the proposed method). The applicability of the method on different kinds of networks (e.g. different numbers of node, different parcellations, networks derived from correlation analysis of functional data) could be discussed by extending the 'Conclusions'. .. View Review 2:

Review 2 (Anonymous)

Reviewer's confidence: 2 (Knowledgeable)

Overall recommendation: 1 (Probably accept)

SUMMARY
This is an interesting submission employing the use of an established metric to characterise brain networks for the purpose of classification and I enjoyed reading it. Focusing on methodology, the authors explained the stages clearly. What is lacking is context of this work in relation to other literature and its performance in relation to alternative distance metrics that the authors refer to. I feel there is enough novelty to accept this work which can be further improved via changes/additions listed under "Constructive feedback" (and "Shortcomings", if possible).

STRENGTHS
Innovative use of EMD in this work to classify spectras. Good demonstration of similarity of spectras in relation to work by de Lange 2014 for both random/simulated and real data networks. Clearly written.

SHORTCOMINGS
It is reasonable to use number of tracts and distances for network weights, however, I feel it would be beneficial to include networks weighted by other metrics such as FA.

In section 3.2 the authors describe previous, related work involving two other distance measures: Kullback-Leibler divergence and Jensen-Shannon distance. It is unclear what the classification performance is from their description: the range in classification performance seems to be between 0.8 and 0.65 with no information on which distance measure or cohort/experiment this applies to. As such, given the similarity between EMD and Kullback-Leibler divergence/Jensen-Shannon distance, it would be beneficial to compare the performance between all three distance measures for this work. In Results 6.2, the authors state that classification performance is better with EMD-based kernel compared to these other methods but in section 5.2 an optimal AUC score of 0.8 was given whereas best AUC results with EMD was 0.7. Either be more explicit about the relevant results from the other paper, or include a comparison here with results to clarify/support the statements made in Results/Conclusion.

As this work is focused by methodology, it may be acceptable to omit discussing the value of the work presented here in the context of other literature which tackle the classification of autism versus control (of which there are several). But this is all the more reason for clarity in presenting EMD results with respects to other distance measures.

CONSTRUCTIVE FEEDBACK

Abstract needs some attention: Rather than listing the methodological steps, it should first clearly pose the problem or aim. After highlighting the key contributions/methodology, there should be a summary of results and a concluding sentence.

For completeness, including a table of classification scores (i.e. Accuracy, Sensitivity (recall), Specificity) for all tests would be beneficial in addition to the AUC.

In section 3.2, the authors seem to wish to point out that EMD has the advantage over other distance measures mentioned as it mitigates the need for defining arbitrary/optimal bin sizes. This advantage could be made clearer later in this section.

Section 5.2, paragraph 2: "Second, we vectorize a matrix by taking the values of its upper triangle....." Explicitly state which matrix you mean here (network, Laplacian etc). Fig 3 suggests this to be the normalised connectome matrix.

Figure 2. Which connectomes were these spectras calculated from for the real brain data: tract count, distance or tractcount/distance networks?

Label axes for spectral plots in Figures 1 and 2. Labelling the groups in gram matrix axes in figure 4 would help, the lines delineating them is unclear.

For a fuller publication, there is scope for further discussion on why the EMD would perform better than other methods and why such an analysis on the eigenvalue spectra would do better than other baseline features tested in this work.

Rebuttal

We thank the reviewers for their detailed comments. We addressed most of the questions and suggestions.

The exceptions are:

we could not provide a more detailed literature review (as suggested by Reviewer 1) due to a 10-page limit of the paper, we only added references to the works that review some graph kernels.

for the same reason, we did not add details on how the choice of the SVM penalty parameter affected the results. Overall, the effect of this parameter was not large; for all models, the curves of ROC AUC values plotted against the penalty parameter began to plateau after some initial interval of slight growth.

we could not try other weighting schemes, such as weighting by FA (as suggested by Reviewer 2). We worked with a ready-made set of connectomes, for which only the original weights and the coordinates of zone centers were available. We certainly agree that evaluation of the normalized Laplacian spectra for different weighting schemes is of particular interest. We commented on this in Conclusions.

we did not add a table of precision, recall and accuracy values for all models (as suggested by Reviewer 2). We simply could not provide these details because of the strict 10-page limit of a paper.

We believe we addressed the remaining comments of the reviewers.