All reviewers recommend accept after the author response & discussion. The reviewers value the paper for its contributions including - novel approach: functional fisher information based regularization for different modalities - extensive experimental evaluation on 4 datasets I agree with this assessment and accept. A major concern with the work is that the authors provide lots of correction/fixes/additions in the appendix & rebuttal and the appendix is rather extensive. This was considered as a reason for rejection, but the reviewers and AC see the strength of the paper and 1) expect that the authors will follow up and revise the paper accordingly. 2) it is important that the authors should aim to make it a self contained paper which does not require a typical reader to read the appendix or prior work for understanding it, but rather clearly refer to the appendix for additional information/results. 3) to clarify early on in the paper what the authors mean by (multi-) modality as this is typically understood as different semantic modalities, e.g. text & images, rather than different "features"/representations of the same modality. 4) If possible the authors should consider including additional results on VQA-CP v2 as suggested by R4.