__ Summary and Contributions__: This paper develops a multivariate variant of a persistence image,
a feature representation in topological data analysis (TDA). In contrast
to previous work in TDA, which is only capable of analysing univariate
data (specifically, only scalar-valued filtrations are supported), this
work shows how to combine a 'sliced' variant of multi-parameter
persistent homology with a fixed-size feature vector to permit the usage
of topological features in standard machine learning contexts.
Next to providing a brief discussion on the stability properties of the
novel method, the paper also analyses the empirical properties and
behaviour of multi-parameter persistence images. Experiments are
conducted in two very different areas, namely time series classification
and image analysis.

__ Strengths__: Since multivariate functions are abundant and TDA has yet to fully
encompass this setting, the paper discusses a highly important topic.
The main strength of the paper is the smart combination of existing
techniques, i.e. sliced or fibred barcodes and vineyards, and the
development of a new representation, inspired by persistence images.
I expect that this paper will provide new avenues for future research in
TDA, and it might inspire more researchers to look into this topic,
since many TDA applications are somewhat 'plagued' by univariate
filtrations (and at least the bivariate case appears to be rather common
in applications).

__ Weaknesses__: Even though I feel favourable about this paper, there are two issues
that prevent me from fully endorsing it. First, the paper is lacking
clarity in some of the definitions. Second, the experimental section is
somewhat problematic.
Please see Section 5 for details on clarity. In the following, I will
comment on the experimental section.
- More details on the DTM calculation are required to make this paper
accessible to non-experts. What is the stability of the proposed
method with respect to these parameters
- When discussing the 'bounding rectangle' in l. 261, does this refer to
the maximum and minimum function values of the filtration?
- Why are there no standard deviations in Table 1? The text specifies
that a cross validation procedure was used (and most of the other
tables contain standard deviations), so they should also be shown.
- Related to that, the paper should be more clear about the 'baseline'
classification accuracy that is to be expected for Table 1. I briefly
checked a few data sets, and according to 'timeseriesclassification.com',
some values are more than 10% lower than the state of the art. Of
course, I do not expect the paper to outperform all existing methods,
but in the interest of fairness, details about what constitutes
a 'good' result should be added.
Having an interest in TDA, I feel strongly about this: applications of
TDA should always include other methods as well in order to
demonstrate to what extent topological features can be helpful.
Additional details on the baselines would strengthen the paper
immensely.
As a simple suggestion, why not add baseline with Euclidean distance
or dynamic time warping distance? While such classifiers are of course
not competitive in all cases, they can provide important insights.
- Finally, I would be interested in knowing the stability with respect
to choosing a set of lines. I understand the theoretical stability
results, but I would be interested in how this choice is being
performed in practice.

__ Correctness__: All claims in the paper are technically correct. Terminology is used
consistently throughout the paper. As far as I can tell, the theoretical
claims in the supplementary materials are also correct.
There are some minor issues with the definitions. For example, in
Definition 3.1, $N$ and $n$ are not used consistently; to my
understanding, Figure 2 should use $N$ instead of $n$. Moreover, why
does the number of lines differ between the three variants? Would it not
also make sense to use $N$ lines with the same slope?

__ Clarity__: The aforementioned lack of clarity is a weakness of the paper. Since
I feel favourable about the topic and the paper, I think this lack of
clarity could deter non-experts from fully understanding this paper.
Here are some suggestions to improve clarity and accessibility:
- In the abstract, consider adding details about *why* the multivariate
or multiparameter setting is not as well understood; mentioning that
no unique representations or decompositions exist here would be
useful.
- The discussion of shortcomings of persistence diagrams in l. 44-- is
somewhat vague. Consider rewriting this and the subsequent sentences.
One shortcoming of the 'original space' of persistence diagrams, for
example, is the lack of fixed-size representations, making their use
cumbersome for machine learning.
(moreover, I think the complexity of distance metrics exacerbates this
issue; the authors might want to consider mentioning this)
- In the Background section, consider adding more explanations about
*how* the topological features are being calculated. Is
a Vietoris--Rips expansion used, for example?
- When introducing multiparameter persistent homology, I would suggest
using different terminology than $\leq$ when discussing
high-dimensional vectors. The switch between the univariate and
multivariate setting should be indicated in the symbols being used;
for example, why not use $\mathbf$ to indicate vectors, for example,
leaving the normal font for scalars?
- I would also define the line $l(t)$ as a function of $t$ instead of
a set; currently, it is defined as a set, but used as a function; in
particular, the usage in l. 187 is redundant. I would suggest sticking
to a function definition.
- Moreover, there is a clash of terminology with $l$ in Eq. 2; please
consider using a different terminology here.
- I would suggest improving the exposition in Definition 3.2; consider
providing an example of $D_L(f)$, maybe by showing an illustration or
extending Figure 3 accordingly? More specifically, Figure 3 could show
the matching or the barcode as an additional inlay.
- Consider showing an example of Eq. 2; since the paper draws on the
persistence image calculation, it would be helpful to illustrate the
calculation process. Figure 4 already describes this to some extent,
but in this figure, more details (and axis labels) would be helpful.
Since TDA is not a well-known framework (yet!), a more 'gentle'
introduction to these concepts would be appreciated by readers.
(the use of the matching information could be illustrated directly,
for example, since this is one of the unique characteristics of this
paper)

__ Relation to Prior Work__: Prior work is adequately referenced for the most part. There are however
more machine learning papers from recent conferences that showcase TDA
applications (for, among others, graph classification). Consider citing
additional papers [this is not required, but for machine learning
readers, it might be more relevant to see additional TDA applications in
ML, rather than general TDA applications]:
- Hofer et al., Deep Learning with Topological Signatures (NeurIPS 2017,
https://arxiv.org/abs/1707.040410): to my knowledge, the first
publication that combines persistent homology with deep learning, and
discusses graph analysis approaches
- Kim et al., Efficient Topological Layer based on Persistent Landscapes
(https://arxiv.org/abs/2002.02778): a recent preprint on combining
persistence landscapes and neural networks
- Rieck et al., A Persistent Weisfeiler-Lehman Procedure for Graph
Classification (ICML 2019, http://proceedings.mlr.press/v97/rieck19a.html):
a TDA-based approach to graph analysis

__ Reproducibility__: Yes

__ Additional Feedback__: ---
Update after rebuttal:
I thank the authors for their rebuttal and I appreciate the additional clarifications, which I think will help improve the manuscript! The discussions with the other reviewers raised some additional concerns, in particular about additional (persistence-based) baselines, so I am keeping my score as it is. I would urge the authors to emphasise the suggested application of the paper slightly better: as stated in the rebuttal, if the primary target/application is immunofluorescence images, the paper should put a larger focus on this.
It would strengthen the submission immensely if the authors could demonstrate that:
- the immunofluorescence imaging application requires topological information (in other words, topology-based methods are improving upon existing baselines)
- the bi-filtration & the proposed representation are necessary to obtain a certain level of performance
I am mentioning this because the computational requirements of persistent homology are already quite high to be begin with, so adding additional complexity needs to be justified by performance, interpretability, or some other benefits.
Finally, I want to reiterate that I feel positively about this paper, and I think with additional changes, it has the benefit of making a strong contribution to machine learning!
---
Despite my criticisms from above, I feel favourably about this paper and
think that, with some improvements, this could be a very strong addition
to the TDA literature.
Here are some additional suggestions and typos:
- 'lower bounds the interleaving distance' --> 'serves as a lower bound for the interleaving distance'
- 'with same slope' --> 'with the same slope'
- 'in two sense' --> 'in a two-fold sense'
- 'Lipshitz' --> 'Lipschitz'
- The citation for Landi (2014) seems incomplete
- I would suggest not to use citations as nouns. Instead of writing 'See
[LW15]' I would suggest writing 'See Lesnick and Wright [LW15]' for
more details.

__ Summary and Contributions__: This paper introduces "Multiparameter Persistence Images", a summary representation of multiparameter persistent homology. The idea is the compute fibered barcodes from a multi-filtration via restricting the filter to lines. This can be computed via the vineyard algorithm and turned into an image representation with a fixed resolution to be used in downstream ML tasks. Experiments are presented on different datasets where the authors compare against recent multiparameter persistence approaches. The proposed method compares favorably.

__ Strengths__: The premise of the paper, stating the multiparameter persistence offers additional information, is certainly correct and an interesting topic to study in ML problems.
Making this TDA approach amenable to ML algorithms is also desirable. While kernel-based techniques have been developed, they typically come with a high computational burden. Hence, studying alternatives, motivated by work on persistence images, seems to be a natural way to go.
Technically, the idea of just having to go through the full computational complexity of computing the barcode once and then using the vineyard algorithm is also quite interesting.

__ Weaknesses__: My main concerns with the approach relate to three issues, outlined below:
First, while I can agree with the claim of multiparamter persistence offering more information, I do not think that the claim clearly holds up in the experiments. Yes, you can filter, e.g., by distance to the point cloud and density, but you could also do this seperately. While it's a little unclear whether the "Dg" columns (in the experiments) represent exactly this, even if they would, there are more discrimiative and machine-learning compatible approaches to do this, e.g., work on learnable representations by Carriere et al., or Hofer et al., which are simpler to compute and typically offer better performance on most tasks. Comparing against existing multi-parameter persistence work is totally fine, but the authors should have included stronger baselines. Some questions that immediately came up are: What was the resolution for the multiparameter persistence images? As far as I understood, its 10x10 or 50x50 (page 7), but as it is written it's somewhat unclear. If the resolution would be, say 50x50, the XGBoost classifier is operating in a 2500-dim. space, as you need to vectorize first and so, performance improvements could potentially be explained by that? If I am misunderstanding this, please clarify in the rebuttal.
Second, I do think the claim that the approach will work with actual MULTI-parameter persistence (i.e., > 2) is quite bold. E.g., it's not clear to me how the number of lines should be chosen if much more parameters would be available. I would assume the lines would also have to be "close" in order for all the theory to hold. In a setting with more parameters, computing single-parameter persistence and, e.g., concatenating vectorized versions of the barcodes, seems much more accessible and computationally feasible. I also think that the discussion on the subtleties of the stability of the matchings (following Prop. 3.4.) is weak. While there is the claim that instabilities can be detected (or they disappear for well-bahaved multi-filtrations) algorithmically, the reader is left with little insight whether this is an actual problem or not. I do find the stability result interesting, but without further insights, the result seems a little disconnected.
Finally, and importantly for the community, I found the notation throughout the paper to be quite hard to digest. E.g., what is "n" in Def. 3.3.? My guess is that this is a typo and should be p, right? (as the image has resolution p x p). I would also guess that the functions need to be bounded. Also, in the same definition, what is a,b,c,d - this can not be totally arbitrary, right? Further, some confusion also came up with the notation bcd(f|_l), particularly the f|_l part, denoting the restriction of f to l: I mean, f is defined on X, so how does this restriction work - there seems to be something missing here ...

__ Correctness__: Technically, the paper appears correct, modulo some typos, but the notation is sometimes quite confusing. Also, some of the claims do not hold up empiricially, in my point of view (see detailed comments).

__ Clarity__: Overall, the reader can follow the key ideas, however, when going into more detail, the paper is very hard to follow and would clearly benefit from additional editing and rounding off the edges. I think, in it's current form, the NeurIPS community would have a hard time following the presented material.

__ Relation to Prior Work__: Prior work on multi-parameter persistence is well discussed and the presented approach also differs to these works in important aspects, but the advantages of multi-parameter persistence over existing (much simpler) single-parameter persistence) approaches are not clearly worked out. I think it's totally fine to go through more (computational) trouble in terms of constructing the multiparameter-persistence images, but there should also be a clear, demonstratable gain.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: I thank the authors for their response. After discussions, while I am positive about the potential of this paper, I agree with other reviewers that the paper needs stronger baseline for comparisons, including comparing with state-of-the-art 1D persistence based approaches.
Topological data analysis has attracted much attention lately. In particular, the use of persistent homology based summaries for different types of data has found successful applications in different practical domains.
This paper develops a topological summary, called multiparameter persistence images, for multi-parameter (mostly 2D) persistence, and shows that it performs on par or better than other topological summaries for the multi-parameter case.
Multi-parameter settings are very important in practice, and having an effective, concise, and easily commutable topological summary for such setting is highly desirable. However, multi-parameter persistence modules do not admit simple decomposition (complete discrete representations) as is the case for 1D persistence. Previous attempts for computing a summary for multi-parameter case typically boil down to taking ``slices'' (i.e, restricted it to lines) and computing persistence diagrams for those 1D slices. The approach proposed in this paper still relies on these 1D slices. However, the key innovation is that it also captures the ``connection" between these 1D slices via the vineyard algorithm. This makes the new topological summary more informative than the previous ones based only persistence diagrams for individual 1D slices.
Detailed strength/weakness are below. Overall I am quite positive about this paper, mainly due to that I think multi-parameter persistence based summary is an important topic. While what's developed in this paper can perhaps be improved, it is more informative than previous approaches and the python package will be made available, which will help facilitate the use of it in practical applications. Current experimental results also show the effectiveness of this new summary.

__ Strengths__: + As mentioned above, developing an effective, concise, and easily commutable topological summary for the multi-parameter setting is highly desirable, and will help further broaden the scope of topological data analysis. This paper makes a good attempt for addressing this.
+ The proposed multi-parameter persistence images encodes more information than previous approaches. Experimental results also demonstrate this.
+ The proposed multi-parameter persistence image has a finite vector representation. Thus it will be easy to combine this representation with existing ML pipelines, including neural networks.
+ The python package is developed and will be made public available. The availability of easy to use tools for topological data analysis is crucial in facilitating the applications of these methods.

__ Weaknesses__: - Existing experiments are quite convincing. But it will be stronger if there had been more baselines, such as more methods based on just single parameter persistence, and/or other simple summarization of multiple persistence diagrams computed from 1D slices. (A minor point: it will be useful to report the time it takes to compute these summaries.)
- There is a concern over the stability of the matching computed by vineyard (the theoretical results do not direclty apply to it in practical setting). I don't expect this to be addressed theoretically, but perhaps some empirical studies can be carried out to show that in practice, these summaries are rather stable.

__ Correctness__: I don't detect any issue from the main text. I read the Supplement as well and it also looks good.

__ Clarity__: The paper is overall very well-written. I have only two comments:
-- can you elaborate on the matching computed from the vineyard algorithm? (This can be put in Supplement.) This is important especially given that the stability of the approach depends on this.
-- Can you elaborate the example in Figure 4: (1) what line set did you use to generate that multi-parameter persistence image? (2) why is that group of long bars necessarily indicate one major circle in data? how should one interpret that? How should one say differentiate the case behind there is one major loop versus multiple loops?

__ Relation to Prior Work__: Adequate

__ Reproducibility__: Yes

__ Additional Feedback__: