Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to Train VGG16 Model for localizing Text from natural images. Used Dataset MSRA-TD500 #20

Open
nikstar802 opened this issue Jul 21, 2017 · 7 comments

Comments

@nikstar802
Copy link

Hi,
First of all, I want to say this library is awesome.

Actually, I am trying to localize Text from natural images. I am trying to train a single Image from MSRA-TD500 Dataset using VGG16 network given by you. But unfortunately, the model is NOT converging as per the expectations.

For understanding, I just want to train my network on single image and test on the same image. But that itself is NOT Happening.

I am using 'ADAM' Optimizer and 'Categorical Crossentroy' as functions and using 2 Classes to categorize text and non-text areas.

This is how it is getting trained. For pre-processing, I am subtracting mean pixels from original image and then dividing the image by standard deviation.

1/1 [==============================] - 64s - loss: 0.7233 - acc: 0.4443
Epoch 2/10
1/1 [==============================] - 51s - loss: 3.2022 - acc: 0.8014
Epoch 3/10
1/1 [==============================] - 52s - loss: 3.2022 - acc: 0.8014
Epoch 4/10
1/1 [==============================] - 52s - loss: 3.2022 - acc: 0.8014
Epoch 5/10
1/1 [==============================] - 52s - loss: 3.2022 - acc: 0.8014
Epoch 6/10
1/1 [==============================] - 51s - loss: 3.2022 - acc: 0.8014
Epoch 7/10
1/1 [==============================] - 52s - loss: 3.2022 - acc: 0.8014
Epoch 8/10
1/1 [==============================] - 51s - loss: 3.2022 - acc: 0.8014
Epoch 9/10
1/1 [==============================] - 51s - loss: 3.2022 - acc: 0.8014
Epoch 10/10
1/1 [==============================] - 51s - loss: 3.2021 - acc: 0.8014

Can you suggest something on this issue...
Thanks ...

@JihongJu
Copy link
Owner

Hi @nikstar802 , I appreciate you like it.
Given training log, it seems the loss explodes after a few updates. Did you observe a sudden loss explosion or a gradual loss growth?

@nikstar802
Copy link
Author

Hi,
Thanks For the reply.

Actually, the loss is exploding suddenly. From Second Epoch itself, loss increases and then becomes constant after 3rd Epoch.
I tried with the simplest image as possible. Created my own image with a blue color uniform background and put few text items on the fore ground with big font sizes.

But unable to understand why the network is NOT training properly.
Is it something related to weight initialization ... I am keeping weights as 'None' before starting the training.
Or may be something related to Loss function. I tried with SGD, RMSPROP, ADAM. But nothing seems to work out.

@JihongJu
Copy link
Owner

JihongJu commented Jul 24, 2017

@nikstar802 Please first make sure you are working with the newest master branch because I forgot to include "softmax" activation previously which causes sudden weights/loss explosion.

Other than that, it can also be possible that:

  1. You are using a too large learning rate.
  2. Your dataset is unbalanced so that it figures out to predict zeros everywhere.

@nikstar802
Copy link
Author

Hi, Thanks for reply.
I was using latest master branch only. Actually, I changed the activation from Softmax to Sigmoid to check to try the model. I changed that back to Softmax and here are the results now.

Epoch 00000: val_loss improved from inf to 6.38728, saving model to /tmp/fcn_vgg16_weights.h5
1/1 [==============================] - 98s - loss: 0.8112 - acc: 0.4567 - val_loss: 6.3873 - val_acc: 0.6039
Epoch 2/100
Epoch 00001: val_loss did not improve
1/1 [==============================] - 66s - loss: 6.3873 - acc: 0.6039 - val_loss: 6.3874 - val_acc: 0.6039
Epoch 3/100
Epoch 00002: val_loss did not improve
1/1 [==============================] - 67s - loss: 6.3874 - acc: 0.6039 - val_loss: 6.3875 - val_acc: 0.6039
Epoch 4/100
Epoch 00003: val_loss did not improve
1/1 [==============================] - 66s - loss: 6.3875 - acc: 0.6039 - val_loss: 6.3876 - val_acc: 0.6039
Epoch 5/100
Epoch 00004: val_loss did not improve
1/1 [==============================] - 74s - loss: 6.3876 - acc: 0.6039 - val_loss: 6.3878 - val_acc: 0.6039
Epoch 6/100
Epoch 00005: val_loss did not improve
1/1 [==============================] - 71s - loss: 6.3878 - acc: 0.6039 - val_loss: 6.3879 - val_acc: 0.6039
Epoch 7/100
Epoch 00006: val_loss did not improve
1/1 [==============================] - 74s - loss: 6.3879 - acc: 0.6039 - val_loss: 6.3880 - val_acc: 0.6039
Epoch 8/100
Epoch 00007: val_loss did not improve
1/1 [==============================] - 72s - loss: 6.3880 - acc: 0.6039 - val_loss: 6.3881 - val_acc: 0.6039
Epoch 9/100
Epoch 00008: val_loss did not improve
1/1 [==============================] - 71s - loss: 6.3881 - acc: 0.6039 - val_loss: 6.3883 - val_acc: 0.6039
Epoch 10/100
Epoch 00009: val_loss did not improve
1/1 [==============================] - 71s - loss: 6.3883 - acc: 0.6039 - val_loss: 6.3884 - val_acc: 0.6039
Epoch 11/100
Epoch 00010: val_loss did not improve
1/1 [==============================] - 71s - loss: 6.3884 - acc: 0.6039 - val_loss: 6.3885 - val_acc: 0.6039
Epoch 12/100
Epoch 00011: val_loss did not improve
1/1 [==============================] - 69s - loss: 6.3885 - acc: 0.6039 - val_loss: 6.3886 - val_acc: 0.6039
Epoch 13/100
Epoch 00012: val_loss did not improve
1/1 [==============================] - 73s - loss: 6.3886 - acc: 0.6039 - val_loss: 6.3886 - val_acc: 0.6039
Epoch 14/100
Epoch 00013: val_loss did not improve
1/1 [==============================] - 73s - loss: 6.3886 - acc: 0.6039 - val_loss: 6.3886 - val_acc: 0.6039
Epoch 15/100
Epoch 00014: val_loss did not improve
1/1 [==============================] - 66s - loss: 6.3886 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039
Epoch 16/100
Epoch 00015: val_loss did not improve
1/1 [==============================] - 65s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039
Epoch 17/100
Epoch 00016: val_loss did not improve
1/1 [==============================] - 65s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039
Epoch 18/100
Epoch 00017: val_loss did not improve
1/1 [==============================] - 65s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039
Epoch 19/100
Epoch 00018: val_loss did not improve
1/1 [==============================] - 65s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039
Epoch 20/100
Epoch 00019: val_loss did not improve
1/1 [==============================] - 66s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039
Epoch 21/100
Epoch 00020: val_loss did not improve
1/1 [==============================] - 65s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039
Epoch 22/100
Epoch 00021: val_loss did not improve
1/1 [==============================] - 65s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039
Epoch 23/100
Epoch 00022: val_loss did not improve
1/1 [==============================] - 66s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039
Epoch 24/100
Epoch 00023: val_loss did not improve
1/1 [==============================] - 68s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039
Epoch 25/100
Epoch 00024: val_loss did not improve
1/1 [==============================] - 66s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039
Epoch 26/100
Epoch 00025: val_loss did not improve
1/1 [==============================] - 67s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039
Epoch 27/100
Epoch 00026: val_loss did not improve
1/1 [==============================] - 68s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039
Epoch 28/100
Epoch 00027: val_loss did not improve
1/1 [==============================] - 66s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039
Epoch 29/100
Epoch 00028: val_loss did not improve
1/1 [==============================] - 67s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039
Epoch 30/100
Epoch 00029: val_loss did not improve
1/1 [==============================] - 76s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039
Epoch 31/100
Epoch 00030: val_loss did not improve
1/1 [==============================] - 70s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039
Epoch 32/100
Epoch 00031: val_loss did not improve
1/1 [==============================] - 70s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039

I have few questions to ask?

  1. Will the model learn just on single Image. (Here I am taking same image for validation also)?
  2. Whatever I am training. When I am using model.predict to predict something, I am always getting a blank image with 500x500 resolution with NO object in it. Am I doing something wrong?
  3. What about weights initialization? You are using 'imagenet', I am not initializing with anything just 'None'. Is that a problem?
  4. Learning Rate. I am using learning rate of 0.001 with ADAM optimizer or RMSPROP?
  5. Is text extraction NOT possible with this model VGG16 ?

...
Thanks

@JihongJu
Copy link
Owner

@nikstar802

  1. If you have only one image, it is very difficult to train a VGG net from scratch.
  2. What is the ratio of textual pixels and non-textual pixels? Imbalanced training samples can be one reason for predicting blank image.
  3. See 1. Pre-trained models won't help either if only one image is provided.
  4. I used ADAM with lr=1e-4 for voc2011 dataset.
  5. It is possible but good choice of hyperparameters is required.

@nikstar802
Copy link
Author

Hi,
I realized that, my ratio of textual pixels and non-textual pixels is too low, that might be the issue, because I am resizing large training images to 500 x 500 and during resizing the features might get disrupted.

So, Now I am cropping my training images randomly in 500x500 sized segments, and total I am cropping one image 20 times, so I am getting 20 Sub Images of 500x500 size each from single training Image. This training set I am feeding in. Here is my model summary.


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) (None, 500, 500, 3) 0


block1_conv1 (Conv2D) (None, 500, 500, 64) 1792 input_1[0][0]


block1_conv2 (Conv2D) (None, 500, 500, 64) 36928 block1_conv1[0][0]


block1_pool (MaxPooling2D) (None, 250, 250, 64) 0 block1_conv2[0][0]


block2_conv1 (Conv2D) (None, 250, 250, 128) 73856 block1_pool[0][0]


block2_conv2 (Conv2D) (None, 250, 250, 128) 147584 block2_conv1[0][0]


block2_pool (MaxPooling2D) (None, 125, 125, 128) 0 block2_conv2[0][0]


block3_conv1 (Conv2D) (None, 125, 125, 256) 295168 block2_pool[0][0]


block3_conv2 (Conv2D) (None, 125, 125, 256) 590080 block3_conv1[0][0]


block3_conv3 (Conv2D) (None, 125, 125, 256) 590080 block3_conv2[0][0]


block3_pool (MaxPooling2D) (None, 63, 63, 256) 0 block3_conv3[0][0]


block4_conv1 (Conv2D) (None, 63, 63, 512) 1180160 block3_pool[0][0]


block4_conv2 (Conv2D) (None, 63, 63, 512) 2359808 block4_conv1[0][0]


block4_conv3 (Conv2D) (None, 63, 63, 512) 2359808 block4_conv2[0][0]


block4_pool (MaxPooling2D) (None, 32, 32, 512) 0 block4_conv3[0][0]


block5_conv1 (Conv2D) (None, 32, 32, 512) 2359808 block4_pool[0][0]


block5_conv2 (Conv2D) (None, 32, 32, 512) 2359808 block5_conv1[0][0]


block5_conv3 (Conv2D) (None, 32, 32, 512) 2359808 block5_conv2[0][0]


block5_pool (MaxPooling2D) (None, 16, 16, 512) 0 block5_conv3[0][0]


block5_fc6 (Conv2D) (None, 16, 16, 4096) 102764544 block5_pool[0][0]


dropout_1 (Dropout) (None, 16, 16, 4096) 0 block5_fc6[0][0]


block5_fc7 (Conv2D) (None, 16, 16, 4096) 16781312 dropout_1[0][0]


dropout_2 (Dropout) (None, 16, 16, 4096) 0 block5_fc7[0][0]


score_feat1 (Conv2D) (None, 16, 16, 1) 4097 dropout_2[0][0]


score_feat2 (Conv2D) (None, 32, 32, 1) 513 block4_pool[0][0]


upscore_feat1 (BilinearUpSamplin (None, 32, 32, 1) 0 score_feat1[0][0]


scale_feat2 (Lambda) (None, 32, 32, 1) 0 score_feat2[0][0]


add_1 (Add) (None, 32, 32, 1) 0 upscore_feat1[0][0]
scale_feat2[0][0]


score_feat3 (Conv2D) (None, 63, 63, 1) 257 block3_pool[0][0]


upscore_feat2 (BilinearUpSamplin (None, 63, 63, 1) 0 add_1[0][0]


scale_feat3 (Lambda) (None, 63, 63, 1) 0 score_feat3[0][0]


add_2 (Add) (None, 63, 63, 1) 0 upscore_feat2[0][0]
scale_feat3[0][0]


upscore_feat3 (BilinearUpSamplin (None, 500, 500, 1) 0 add_2[0][0]


activation_1 (Activation) (None, 500, 500, 1) 0 upscore_feat3[0][0]

Total params: 134,265,411
Trainable params: 134,265,411
Non-trainable params: 0


Now, with this model, even one epoch is NOT moving forward. My system is hung up with this ....
Kindly let me know, what best I can do to validate that my model is good at least for single image, so that I can go ahead and arrange a GPU for training purpose.

...
Thanks
Nikunj

@JihongJu
Copy link
Owner

@nikstar802
It is generally not recommended to train with a single image. You can feed multiple images or patches for one iteration instead of using a single image for multiple iterations.
How to debug a training process is kinda tricky because there are many things that can go wrong, and lack of data is always one of them. This post may be useful to get some hints.
In general, you only know if it works or not when you train it with the largest dataset you can have.
As a proof of concept, you probably can subsample the images and use a small model, e.g. AlexNet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants