-
Notifications
You must be signed in to change notification settings - Fork 0
Omp target code example
// cnn_cpu.c
...
network->input = data->X;
network->truth = data->y;
...
count = 0;
while (count < max_batches) { // host
for (int i = 0; i < img.col/batch; i++) {
forward(network->input, network->truth, network->weights, ...); // device
backward(network->output, network->truth, network->weights, ...); // device
update(network->weights, ...); // device
}
count++; // host
}
...
// cnn_target.c
...
network->input = data->X;
network->truth = data->y;
...
count = 0;
#pragma omp target data map(tofrom: network) // How to copy data in terms of a structure? To split data and other parameters?
{
while (count < max_batches) {
{
for (int i = 0; i < img.col/batch; i++) {
#pragma omp target // How to handle Sync?
{
forward(network->input, network->truth, network->weights, ...);
backward(network->outputs, network->truth, network->weights, ...);
update(network->weights, ...);
} // target region ends
} // for loop ends
count++;
} // while loop ends
} // target data region ends
Within forward/backward/update modules
while(count< max_batches) {
for (int l = 0; l < network->n; l++) {
#pragma omp parallel for
for (int i = 0; i < network->batch*network->outputs) {
for (int j = 0; j < network->channels) {
// conv...
}
}
#pragma omp parallel for
for (;;) {
for (;;) {
for (;;) {
// conv...
}
}
}
...
All the data, including images and model parameters, are in the structure network.
Among them, network->input
and network->truth
need to be copied to device, but no need to be copied back;
network->weights
and network->output
need to be copied to device and finally be copied back.
On entrance of target data region, the memory is allocated on device and no more actions are taken - no initialization and data copy: omp target data works with omp target, when target region is entered and the memory is already allocated on device, initialization/data copy happens. On exit of target data region, array created by target data alone will be deallocated. The array on host will be updated if the map-type is from or tofrom. After that the array on device will be deallocated (omp target, if is not in target data region, is also working like this).
The array needs to be copied to device only is handled by omp target: omp target copies data from host to device. Any array needs to be copied back to host is handled by omp target data, omp target and omp target update. In this case, the map-type of omp target data is tofrom, mapping the memory between host and device. Also, for the target regions are still in target data region, then, even the map-type of omp target is tofrom, data will not be copied back to host when on the exits of target region.
Compile
Clang and llvm version need to be matched. For instance, on cci-carina, the lib for llvm-14 is shown below.
clang target.c -L/opt/llvm/llvm-14.x-install/lib/ -lm -fopenmp -fopenmp-targets=nvptx64 -o target
Run
./target
if no NOWAIT with omp target, the time should be really long...most of the time spent on Sync (according to nvprof).
- try to pipeline data copy of next batch and computation of current batch
- optimization: get batch size, adapted schedule from input data