There's no difference: Convolutional Neural Networks for transient detection without template subtraction
We present a Convolutional Neural Network (CNN) model for the separation of astrophysical transients from image artifacts, a task known as "real-bogus" classification, that does not rely completely on Difference Image Analysis (DIA) which is a computationally expensive process involving image matching on small spatial scales in large volumes of data. Because it does not train using the difference image.
We train to CNNs for the ``real-bogus'' classification:
- Used as training data the template, search and difference images, called DIA-based model.
- Used as training data the template and search images, called noDIA model.
By this study, we intend to show that:
- A ``real-bogus'' classifier with high accuracies (97% and 92%) through the use of Convolutional Neural Networks, which avoid the process of feature extraction by using the images as input data. The result of these classifiers coincides with the result found by other CNN models, like Gieseke et al. (2017), Cabrera-Vives et al. (2016) and 2017.
- As a proof of concept, reduce the cost of transient detection by showing that is possible to avoid the Difference Image Analysis (DIA).
We use data from the first season of the Dark Energy Survey (DES). This data was used to train the "real-bogus" classifier implemented by DES, autoscan that is explain in Goldstein et al., 2015.
Some examples of the data:
The data have 50% of "bogus" type objects, like the first two objects in the image; and 50% of "real" type objects, like the last one two.
Because we are using a type of model (CNNs) that was designed originally to, for example, classify dogs and cats. We have to pay extra attention to the values used as input data to train the models.
-
For the difference images, we standardized to have a mean μ of 0 and a standard deviation σ of 1.
-
For the template and search images, there are a lot of extreme values, outside the 3σ interval. The following image shows the pixel value distribution (in this case brightness) for the 3σ interval.
-
To preserve this information and also the resolution given for the values inside the 3σ interval. We map the interval μ±3σ to 0-1, then the extreme values are greater than 1 or less than 0. The following image shows the same four examples above but after mapping.
We horizontally stacked the images to mimic closely what we, as human scanners, do to classify the images as "real" or "bogus". On the left, there are some examples for the DIA-based model and on the right for the noDIA model (only template and search).
For the DIA-based and noDIA models, we designed two similar architectures. In this way, we enable easier and more direct comparison of the performance of the CNN model to classify "bogus" and "real" objects. The parameters of hyperparameters are not optimized for the noDIA case. We just wanted to compare the making decision process of the CNNs models by just removing the difference image.
We have two models with very high accuracies for the "real-bogus" classification process
-
Confusion matrix for the DIA-based model for the 20,000 images used for testing. Showing 97% accuracy.
-
Confusion matrix for the noDIA model for the 20,000 images used for testing. Showing 92% accuracy.
-
Loss and accuracy curves for the DIA-based model on the left and noDIA on the right.
We have two very good models that classify with high accuracy the "real-bogus" data that we used. We wanted to interpret those results via the feature importance analysis and provide some intuition about the differences between the DIA-based and noDIA models. We decided to explore the Saliency maps of the two models.
Saliency maps quantify the importance of each pixel of an image in input to a CNN in the training process.
They provide some level of interpretability through a process akin to feature importance analysis by enabling an assessment of which are the pixels the model relies on the most for the final classification.
If we are classifying dogs and cats, we would expect to have a lot of important pixels in the faces, eyes, bodies, ears, etc of the dogs and cats.
- For the DIA-based case, we expected to find a lot of important pixels in the difference image.
- For the noDIA case, the expectation was less clear, beyond our experience (intuition) as a human scanner in the "real-bogus" classification.
We design a simple metric for the quantitative analysis. This one consists of the sum of all the saliency pixels in each of the segments of the input image (i.e., sum of only the pixel in the difference image), normalized by the sum of all the pixels in the three segments. If for example, for the final classification, the model relies only on the difference image, this segment would have a score of 1 and the template and the search a score of 0. If for the final classification, the model uses information of the three images, each segment would have a score of ~0.333.
Confusion matrix reporting the proportion of transients for which the highest concentration of important pixels is found in the difference, search, or template portion of the input image for the DIA-based model.
In general, the DIA-based model relies more on the information in the difference (D) image. For the "real" object correctly classify as "real" (the dark blue square), for 90% of them, the model used more information in the difference image, 9% more the information in the search, and 1% in the template.
Confusion matrix reporting the proportion of transients for which the highest concentration of important pixels is found in the difference, search, or template portion of the input image for the noDIA model.
In general, the noDIA model relies more on the information in the template (T) image. For the objects classified as "real", correct or incorrect (dark blue and light orange squares), the ratio between the template and the search was less than for the objects classified as "bogus".
Pandas, Numpy, Keras, TensorFlow, Matplotlib, Seaborn.
All the code that supports the analysis presented here is available on a dedicated GitHub repository
To run the code, severals steps needs to be made
- Download the data from autoscan, including the csv file with features
- Convert csv to feather file
- Run the 3sigma_data.py to create the train and test data sets
- Run job-final.sh, you need to specify the name of the model and the type of data. For example, the name of the model is "CCCC" to be run with the data mapped to 3σ and for the noDIA case. So the name would be "CCCC_3s2DH", this is necessary so file knows where to extract the data and if the difference image is used or not.