- Our major concern regards the lack of adequate statistical information that the data relies on. The authors report that the definition of units as "flight" or "assessment" cells was made via analysis of firing rate variation associated with the behavioral events. They show, in Figures 2 and 3, an apparent peaking of responses right before flight for "assessment+" cells, and right before flight initiation for "flight+" cells, and suggest, in the Methods section, that the definition of these categories was made by "Wilcoxon rank-sum test". Since the authors do not properly report the results of this statistical analysis, simply stating a p-value, it is hard to judge whether the classification is accurate. Perhaps using auto-correlograms would increase classification accuracy.
- In addition to this issue, it is not clear whether the classification was made at the within-individual level (i.e., for each mouse) or at the between-individual level (i.e., for all mice). This is important because, at 4-8 mice per region, statistical power is considerably low, and can only reach and adequate level by pooling data from individual neurons at the between-individual level; however, this constitutes pseudo-replication, and can considerably inflate effect sizes and p-values. This lack of clarity impairs judgments on the replicability and generalizability of the findings.
- Even though it could be expected that datasets and analysis scripts were not shared due to concerns with scooping before publication, this information can be privately shared with journal referees only, allowing them to assess the computational reproducibility of the statistical model used to classify cells, and therefore the robustness of the findings. We strongly recommend that the authors do so when they submit the paper to a journal, and also that this information is shared with readers after publication.