Paper ID: | 1434 |
---|---|

Title: | Transfer Anomaly Detection by Inferring Latent Domain Representations |

This paper presents a novel approach to transfer learned representations to new but related domains. Detectors parameters for new domains are inferred from existing ones without re-training. This approach is well suited to anomaly detection where anomaly labels are hard to obtain because of the rarity of anomalies. Additionally the method is flexible as to allow to target domain data to be used for training. The anomaly score function parameters for a target domain are inferred from the trained parameters (which may or may not include target domain instances) and latent domain vectors which are estimated by sampling from a model (multivariate gaussian or other). Experiments include a synthetic toyset as well as benchmark datasets including rotated MNIST. In the case of MNIST the domains are rotations of the digits by 15 degrees increments (6 domains are created: 0 to 75). Five domains are used for training and the remaining for transfer. The method is compared to 8 other approaches for anomaly detection. Results demonstrate that the method works very well and surpasses all other methods. PROS: The paper is well written, technically sound and the mathematical exposition is quite extensive. The method is original and novel. The experiments are relatively extensive and the results show that the approaches surpasses other anomaly detection approaches. The method is flexible in that it allows for normal instance of the target domain to be used if present at training time, but also allows for the anomaly score function to be inferred when those instance are not present. CONS: The author should give some intuition as to when the approach starts breaking apart when the source and target domains become less related. The toy dataset and MNIST-r exhibit clear domains, however, the other datasets' domains are not intuitive. The computational complexity of the method is only briefly stated without any justification. Given that sampling may sometimes be expensive, it would be useful to get a feel for it. In particular since IoT applications are mentioned that would benefit from not needing to retrain, it would be interesting to see the speedups. A pseudo-code algorithm would be helpful for somebody wanting to reproduce these experiments. For MNISR-r, the author do not specify what the domains are. One can guess from table 1 that they are 15-degrees rotation increments.

This paper proposed a transfer learning based anomaly detection algorithm to infer the anomaly scores of the target domain examples by transferring knowledge on related domains. To achieve this, a latent domain representation vectors are trained to capture the domain knowledge. Quantitative experiments and empirical studies are presented to manifest the efficacy of the proposed algorithm. In general, the paper is well-written and well-organized. My main concern is that there lacks some insightful discussion regarding the proposed problem and algorithm. In particular, (1) The proposed algorithm requires to obtain the exact number of normal/abnormal examples in each domain. However, it could be impractical in real-world applications. (2) It is often the case that the target domain and source domain may have the imbalanced sizes of data. Will it be problematic for the proposed algorithm? (3) In many applications, one may observe that the anomalies are partially overlapped with the normal patterns. Can the proposed algorithm handle such specific case?

This paper proposed a practical method for anomaly detection in different domain. The proposed method does not require anomalous instances and could be used directly or fine-tuned in the target domain. The authors present their idea logically. For example, they combine Gaussian distribution and Neural networks to generate latent vector; They add a regularization term to avoid overfitting and modifies loss function to use only normal instances in finetuning. The experiments on 4 public datasets verify the effectiveness of the proposed method. The experiments in section 5 and supplementary material are complete. However, the 4 datasets are relatively simpler comparing to CV datasets. And even the method NN(neural network classifier with only one hidden layer) could achieve a good result on the 4 datasets. Could the proposed method be used on CV datasets? Is the proposed method limited by the performance of encoder & decoder? Besides, for section 4.2, how the Z works in network F and how the Z is estimated by neural network are not clearly indicated. That would be important details to reproduce the method.