Skip to content

Latest commit

 

History

History
123 lines (109 loc) · 4 KB

neuralTalk_summary.md

File metadata and controls

123 lines (109 loc) · 4 KB

Checkpoint data

Contents of the pretrained checkpoint (model_id1-501-1448236541.t7):

  • Training loss
    loss_history = {#iteration: loss}

  • Predicted sentences on validation examples
    val_predictions = {#iteration: {image_id, caption}}

  • Validation loss
    val_loss_history = {#iteration: validation loss}

  • Language metrics
    val_lang_stats_history = {#iteration: {Bleu_1, ROUGE_L, METEOR, Bleu_4, Bleu_3, Bleu_2, CIDEr}}

  • Vocabulary
    vocab = {word_id: word}

  • Number of iterations
    iter = 132500

  • Configuration options (see below)
    opt = {option_name: value}

  • Actual trained models (see below)
    protos = {cnn: Trained convnet, lm: Trained LSTM}

Note: Use checkpoint.xxxxx to access contents

LSTM architecture

Single layer, 768 input, hidden and cell vector, 9567 words in vocabulary. The image vector produced by the convnet is feeded as input at t = 0, the start token is feeded at t = 1 and from them on as normal (feeding the previosuly predicted word). This is not as described in the original neural talk paper where they used a vanilla RNN and fed the image vector as the first hidden vector [Karpathy and Fei-Fei, 2015].

Convnet architecture

VGG-16 architecture with top layer replaced to produce a 768-dimensional output vector.
Output of print(checkpoint.protos.cnn):

nn.Sequential {
	[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> (38) -> (39) -> (40) -> output]
	(1): cudnn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1)
	(2): cudnn.ReLU
	(3): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1)
	(4): cudnn.ReLU
	(5): cudnn.SpatialMaxPooling(2x2, 2,2)
	(6): cudnn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1)
	(7): cudnn.ReLU
	(8): cudnn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1)
	(9): cudnn.ReLU
	(10): cudnn.SpatialMaxPooling(2x2, 2,2)
	(11): cudnn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1)
	(12): cudnn.ReLU
	(13): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
	(14): cudnn.ReLU
	(15): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
	(16): cudnn.ReLU
	(17): cudnn.SpatialMaxPooling(2x2, 2,2)
	(18): cudnn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1)
	(19): cudnn.ReLU
	(20): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
	(21): cudnn.ReLU
	(22): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
	(23): cudnn.ReLU
	(24): cudnn.SpatialMaxPooling(2x2, 2,2)
	(25): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
	(26): cudnn.ReLU
	(27): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
	(28): cudnn.ReLU
	(29): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
	(30): cudnn.ReLU
	(31): cudnn.SpatialMaxPooling(2x2, 2,2)
	(32): nn.View(-1)
	(33): nn.Linear(25088 -> 4096)
	(34): cudnn.ReLU
	(35): nn.Dropout(0.500000)
	(36): nn.Linear(4096 -> 4096)
	(37): cudnn.ReLU
	(38): nn.Dropout(0.500000)
	(39): nn.Linear(4096 -> 768)
	(40): cudnn.ReLU
}

Options

Output of print(checkpoint.opt):

{
	cnn_optim_beta : 0.999
	finetune_cnn_after : 0
	batch_size : 16
	val_images_use : 3200
	optim_epsilon : 1e-08
	input_encoding_size : 768
	losses_log_every : 25
	id : "1-501-1448236541"
	optim_beta : 0.999
	input_h5 : "/scr/r6/karpathy/cocotalk.h5"
	rnn_size : 768
	cnn_learning_rate : 1e-05
	cnn_optim_alpha : 0.8
	language_eval : 1
	learning_rate_decay_every : 50000
	optim : "adam"
	gpuid : 0
	cnn_model : "model/VGG_ILSVRC_16_layers.caffemodel"
	drop_prob_lm : 0.75
	grad_clip : 0.1
	cnn_weight_decay : 0
	input_json : "/scr/r6/karpathy/cocotalk.json"
	seed : 123
	learning_rate_decay_start : -1
	seq_per_img : 5
	cnn_optim : "adam"
	max_iters : -1
	checkpoint_path : "checkpoints"
	start_from : "/scr/r6/karpathy/neuraltalk2_checkpoints/vgood1/model_id3-230-1448140513.t7"
	learning_rate : 0.0004
	cnn_proto : "model/VGG_ILSVRC_16_layers_deploy.prototxt"
	backend : "cudnn"
	save_checkpoint_every : 2500
	optim_alpha : 0.8
}