Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No validation on backup files - roll back not possible #27

Open
starkjs opened this issue Dec 15, 2021 · 6 comments
Open

No validation on backup files - roll back not possible #27

starkjs opened this issue Dec 15, 2021 · 6 comments

Comments

@starkjs
Copy link

starkjs commented Dec 15, 2021

There is no validation on backup files.
I have a case where the backup path filled up from the script and a number of jar files didn't get backed up, but did get modified.
This means there is no rollback - very bad

if [ ! -f "$targetbackup" ]; then
  echo "Backing up to '$targetbackup'"
  cp -f "$jarfile" "$targetbackup"
fi
@jtran-cloudera
Copy link
Contributor

Thanks for the report. We are looking into a fix.

@starkjs
Copy link
Author

starkjs commented Dec 15, 2021

No worries @jtran-cloudera

I will submit a PR today, I have a number of fixes. My clients raised cases with cloudera too. So it’s in the notes for those cases

@starkjs
Copy link
Author

starkjs commented Dec 16, 2021

Hi @jtran-cloudera, I am not able to send you my code via public as it's IP.
I have feed back the code changes via our Cloudera Consultant and he will pass it back via the Cloudera Case my client has open.
Thanks
Josh

@starkjs
Copy link
Author

starkjs commented Dec 19, 2021

Hi @jtran-cloudera, @sdevineni, I see you added the code to validate the backup file, but it's only on jar files, it's also needed on every backup file, like the tar.gz, nar and the new uberjar code.

I see you also added

when the code doesn't match the backup, I think that is a bad idea, as it will exit the entire script at that point.

Thanks
Josh

@sunilgovind
Copy link
Contributor

Yes, we are working to update this for nar files as well.

if backup fails, it could be because of permissions or space elated issues. hence a fail-fast methodology is adopted to figure our the reason behind the backup creation.

@starkjs
Copy link
Author

starkjs commented Dec 19, 2021

Hi @sunilgovind,

Sounds good. I have already added the sha checksum to the tar.gz and nar too

I disagree, from the point of view of automation, I don't want the script to die, it should report issues, not action in those cases and move on. When you have to work on 100's and 1000's of servers to run the patch, you don't have time to stop and debug on Production. All testing needs to be done in NonProd and get all the issue sorted before running in Production

Thanks
Josh

sdevineni pushed a commit that referenced this issue Dec 22, 2021
…rable class for both HDP and CDH side. Re-run is not needed. (#27)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants