This repository has been archived by the owner on Aug 5, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 491
How to create ImageNet LMDB
Feng Zou edited this page Nov 17, 2017
·
10 revisions
LMDB (Lightning Memory – Mapped Database) is a key-value store database, supported by Intel distribution of Caffe*. One of the most advantage of this solution is its high-throughput. Trainings and validation datasets can be converted to the form stored in the LMDB.
##General scenario and parameters description Intel distribution of Caffe provides script supporting users with creation of LMDB.
General steps:
- Download training and validation images of ILSVRC2012 from http://image-net.org, after signing up. Each type of files should be stored separately.
- Execute the script for download auxiliary data:
$./data/ilsvrc12/get_ilsvrc_aux.sh
- If necessary, perform pre-processing of the training/validation data (e.g., for the images resize height/width).
- Create LMDB with the script as below:
$examples/imagenet/create_imagenet.sh
Before run, please verify following parameters of the script:
-
TRAIN_DATA_ROOT
andVAL_DATA_ROOT
variables point to the path of the training and validation data -
resize_height
– the height of the image will be resized according to this -resize_width
– the width of the image will be resized according to this value -shuffle
– if set, during creating LMDB database, entries will be mixed (the order of the entries will be random) -encoded
– if true the LMDB will be compressed -$DATA/train.txt
or$DATA/val.txt
– text file indicates a classification of the images used to training or validation. -$EXAMPLE/ilsvrc12_train_lmdb
or$EXAMPLE/ilsvrc12_val_lmdb
– the path to the location where LMDB will be saved
- Use the created LMDB in the Intel distribution of Caffe.
##Example execution:
For this guide purposes, examples illustrate this point by importing training and validation data from the ImageNet.
- Download ImageNet training and validation data.
- Navigate to the imagenet directory, e.g.,
cd path/to/caffe/examples/imagenet
- Edit the
create_imagenet.sh
script, which should contain the following:
TRAIN_DATA_ROOT=/data/imagenet/train/
VAL_DATA_ROOT=/data/imagenet/val/
RESIZE=true
...
ENCODE=true
...
- Run the script, e.g:
./examples/imagenet/create_imagenet.sh
Results of the script run above:
Creating training lmdb...
...
Creating val lmdb...
I1124 10:58:44.212462 193703 convert_imageset.cpp:123] Shuffling data
I1124 10:58:44.219236 193703 convert_imageset.cpp:126] A total of 50000 images.
I1124 10:58:44.219633 193703 db_lmdb.cpp:72] Opened lmdb examples/imagenet/ilsvrc12_val_lmdb
I1124 10:58:51.641278 193703 convert_imageset.cpp:184] Processed 1000 files.
I1124 10:58:58.952800 193703 convert_imageset.cpp:184] Processed 2000 files.
I1124 10:59:05.942912 193703 convert_imageset.cpp:184] Processed 3000 files.
...
Done.
- The
ilsvrc12_train_lmdb
andilsvrc12_val_lmdb
directory should be created by the script, in the path according to the setEXAMPLE
variable. - Update the
.prototxt
file of the particular model using in the Intel distribution of Caffe, e.g.,
data_param {
source: "examples/imagenet/ilsvrc12_train_lmdb"
batch_size: 256
backend: LMDB
}
*Other names and brands may be claimed as the property of others