The approach is general which may allow for extensions to incorporate other signals not typically available. This work also provides a mechanism for inspecting the contributions of the factors on language attention. Additionally, the authors use more than one VLN setting to show the generalization. However, the approach is complicated which may limit adoption. Additional analysis to provide an intuitive sense of the approach's strengths and weaknesses will strengthen its place in the literature.