NeurIPS 2020

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation


Meta Review

The reviewers were positive about the ideas in the paper and mostly debated the merits of the evaluation. For one they were not fully convinced about the arguments in the rebuttal about the differences between the sharpness of boundaries for action localization and sign language translation. For camera ready I would suggest better addressing this point, as well as comparing or justifying differences to "Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation", Camgoz et al, CVPR 2020. One final suggestion is to add results with one more video encoder in addition to I3D.