Reviews: Integrating Markov processes with structural causal modeling enables counterfactual inference in complex systems

The key research questions of this paper concern the conditions under which a Markov process model of a system observed at equilibrium can be successfully converted to an SCM. A complete answer to this question would require proofs of soundness (Any input MPM is successfully converted to an SCM) and completeness (All MPMs can be converted to SCMs), which are not included in the paper. In their response, the authors note that "any probability model there exists a class of SCM models that are equivalent to that probability model in distribution," and say that they will "emphasize this point in the revised manuscript by including a lemma that translates this general result to our case." At a technical level, the paper is dense. For complete understanding of this paper, readers would have to understand Markov process models, structural causal models, some amount of molecular biology, and several areas of mathematics. I apparently lack the background in Markov process models, molecular biology, and perhaps some areas of mathematics to fully understand the contributions of the paper. In their response, the authors promise to "adjust the text to make the key findings more succinct, and move biological exposition and mathematical details that are not essential to communicating those findings to Supplementary Materials." The authors position the work well within the larger context of work on causal inference, including some fairly obscure but extremely interesting recent work on causal inference applied to RL.

UPDATE afther the authors' rebuttal: thank you for your response to our comments, was interesting to read about the other reviewers' comment as well. I still believe your work is worth presenting at NIPS; now the meta-reviewers have their difficult job to do and make a selection. Best wishes for your work I would love to see published soon. After the review period, I could help reading your manuscript more in-depth (incl. the Suppl material) and found a few remarks for you to fix: - l81+82 notations, I assume it reads ...$ X_i $, $ \mathbf{p} = ... and then $ X_i \tilde p_i(... $, correct? NB you should define the noises N_i here and not on l89. - eqn (4), you gave me a hard time on this one :). According to Wilkinson 2011, p194, the second term (h_2) of the sum on the RHS should be for Y(t)+1 instead of Y(t)-1. Seems to be what you wrote on l69 of your suppl. material, so reinsuring re: what you implemented! - eqn (6) and (7): on a more careful reading, the g is not g(u)=u/(u+1), the derivaiton of the suppl. material reads OK to me so should be \frac{v_1 X_1(t)}{v_1 X_1(t) + v_2 X_2(t)} - l236: shouldn't it be E[K3(t)] instead of K3(t) in the RHS of the equation? This is what seems correct (and written in Eqn (4) of the Suppl. material...which has missing brackets as a typo) The work in this paper describe the merging of Markov processes with SCM, ending up with the benefits of both models. To me, this is a very interesting, solid contribution, clearly worth presenting at NIPS. I list my questions along the text: - SCM in abstract: avoid acronym (and define it first time used) - l18: biochemical kinetic law is ONE possible modelling of a biological system (making a few 'averaging' assumptions btw), others exist. - in the introduction, I would introduce the reference to Wilkinson's book. - l46: to be pedantic, can we 'empirically' demonstrate a statement? You rely on two case studies only so more a proof of concept. - l61: invariantS or invariant components (i.e. \sum_{i=1}^J X_i (t) = constant)? - l71: 'occurs when': is it a iif, or a sufficient condition only? - l80: not obvious that D needs to be acyclic. From a definition point of view, yes, from a purely causal perspective, this is a limiting assumption, could be discussed later? Ideas to extend this? - l81-82: notation X_i p = ... and X_i p_i unclear - l103-104: I assume the 2 distributions under the X_i^* = x do operator are DEFINED as equivalent, if it's a property, I am lost, how do you prove this? - l114: again, some discussion on the approximation quality of plugging in P_N^{C;X=x} for P_N^C ? - l115: the notation looks like a burden, is it needed, you don't seem to use it in your manuscript... - Algorithms 1 and 2: I sympathise you are limited by space, yet the presentation of both algorithms side by side is poor. Quality of the $ \approx $ at the end of algorithm 1, where is it presented/discussed? In algorithm 2, it is 0:ssize or 1:ssize? No other parameters to be passed on to intModel()? - Say that from l144, you depict the Algorithm 1 steps. l145: a reference or a short explanation, as this is not obvious to me? - l150: binomial or Bernoulli? - l158: an interpretation of the inequality? - l163: ...studies 1 and 2 THEREAFTER, it may... - l207-209: noise is assumed to be the same, not the deterministic part (averages), right? - I am sure you knew someone was going to complain: Fig. 1 (b) and Fig. 2 (a) are barely readable when printed, might pay to try something and to make more precise statement in the text about what you get out of these plots. I would like to finish by congratulating the authors. Quite appreciated the 'discussion' type of research and the fact that you provide the code. Very convincing work.

Paper ID:	7995
Title:	Integrating Markov processes with structural causal modeling enables counterfactual inference in complex systems

Reviewer 1

Reviewer 2

Reviewer 3