The paper proposes a novel text-guided image manipulation method by proposing word-level discriminator loss. The proposed method is faster and requires less memory compared to existing models, and the experimental results show improvements over the baseline method (MainGAN). The paper initially received mixed ratings but the concerns were addressed by the rebuttal and all reviewers converged in favor of acceptance. The authors should revise the paper reflecting the reviewers’ suggestions and as promised by the rebuttal. NOTE FROM PROGRAM CHAIRS: For the camera-ready version, please expand your broader impact statement to discuss the potential negative impacts of your work, such as forgery and deepfakes, as well as possible mitigations.