-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation model implementation speedup ideas (model.predict is slow) #31
Comments
Shoreline Extraction Process - Performance IssuesOverviewWe are experiencing slowdowns in our shoreline extraction process. Below are the key differences in our approach compared to the standard CoastSat method, which we believe might be contributing to these performance issues. Key Differences
The remaining steps are largely similar to the CoastSat approach and are not detailed here. Planned ImprovementsPlease try the following to see if they cause an performance increases in the shoreline extraction process for the zoo workflow. Tasks
|
Okay, there are two problems being articulated here. I opened the issue with the doodleverse/do_seg or I think you are referring to the subsequent process of using the npz files (image segmentation outputs) from shoreline extraction. My focus right now is how very slow in general I made the doodleverse-utils However now I see that keras.utils has
and standardization cam be applied using
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Normalization This layer will shift and scale inputs into a distribution centered around 0 which finally I figured out could be applied using
Now we can call the mode.predict step on all the jpeg files using all cores like so
:) Now I need to figure out how to implement it in ensemble model, etc, and how to get it in the doodleverse, but I just tested it on 1175 images and it worked great |
This approach would be adapted to NDWI and MNDWI jpegs using color_mode="gray" |
I think maybe this issue should be in doodleverse_utils. Ultimately, I think you just need to call |
@dbuscombe-usgs |
This is a complicated upgrade because it needs to be able to deal with a lot of different scenarios (large and small images, large and small numbers of total imagery), and because of the custom inputs/outputs we need for model inference. I'm working with the largest dataset (5500x7500 pixel imagery, up to tens of thousands of samples) in order to explore options, because the greatest limitation is always GPU memory. I have found that in order to use This is typically defined within a scope of GPUs or multiple GPUs:
Then my current implementation uses GPUs for
My custom garbage collector class is:
This is a profoundly different approach - it uses Keras' normalization layer (that I haven't yet been able to verify produces the same results as our custom image standardization routine). It also does the argmax and resizing steps afterwards, on the cpu. This is for memory management and speed. This is the binarization step, which can happen on GPUs
This is the resizing step, which must happen on the CPU for large datasets
It works, but I'm still exploring consistency with previous approaches |
Th above workflow is specific to the ResUNet, because of the resizing step. SegFormer needs a slightly different implementation |
Notes on SegFormer implementation. First, the model is called using a simpler API provided by TF
then model is constructed using
SegFormer models take reshaped inputs, rearranging the channels from 0,1,2 to 2,0,1. This adds a layer of complexity because only inputs for segformer need to be reshaped. This is dealt with using a custom Lambda layer to reshape inputs
Finally, we use the logits from the SegFormer model, so the inference code is adapted like so
Groovy!! we now have a single streamlined workflow for both ResUnets and SegFormers based on |
This is the specific model checkpoint we modify by fine-tuning to new data https://huggingface.co/nvidia/mit-b0. In the future we could upgrade to https://huggingface.co/nvidia/mit-b1 or similar in the future |
Thank you for laying out the steps it takes to construct the model, add the new layers, how its called in the code and finally how it can run in inference mode. It all being laid out like that made it much easier to understand. I do have a minor question, in the code below what is
|
That's my shorthand for the keras backend which I use to access https://www.tensorflow.org/api_docs/python/tf/keras/backend/clear_session
|
oh, and I also use this convenient-for-my-purposes function https://github.com/tensorflow/tensorflow/blob/23c218785eac5bfe737eec4f8081fd0ef8e0684d/tensorflow/python/keras/_impl/keras/backend.py#L1696 |
@dbuscombe-usgs - i still can't find the exact scrpiut i used to make the lambda layer output the prediicted segmentation, but here is a related codeblock developing a lambda layer to output a confidence (the difference between highest and lowest logit for each pixel, then summed over all pixels).. (from this nb)... i think it can easily be adapted with a a squeeze and an argmax
then attach it to the Gym model at the end
and then use the model as normal... so i bet something like this would work (untested):
|
Thanks for this. Just looking at it for the first time today. For a segformer model, there is no There is no The model is defined thus
The **kwargs are (very typically) badly documented so I don't know if there is an option to specify argmax (I doubt it) After some searching, I don;t know how to proceed. SO, I am moving onto the 'normalization/standardization issue' |
I can't make any progress on standardization either. I can't seem to get https://keras.io/api/layers/preprocessing_layers/numerical/normalization/ to function properly. It never produces the intended output of a batched tensor consisting of zero mean and unit variance. No matter if I rescale the imagery first, reorder channels, specify channels, etc. Always wrong. I can't find any other examples |
@ebgoldstein did you say you had an example of a custom standardization layer |
For example, this code runs but always produces garbage
|
At the moment, Unet models are called in a loop, one image at a time. Model inference time reported by keras is approximately constant, but the overall time per image increases steadily. Over the course of ~1000 images, the slow-down is about 4x - 5x, on Windows and Linux.
This is not a priority right now, but should be fixed eventually. There are at least two aspects
Would this eventually be solved by switching to dask?
Leaving this issue here to be looked at later
The text was updated successfully, but these errors were encountered: