Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error when a not yet inserted job is updated #7196

Merged
merged 7 commits into from
Feb 18, 2021

Conversation

davimacedo
Copy link
Member

New Pull Request Checklist

  • [ X ] I am not disclosing a vulnerability.
  • [ X ] I am creating this PR in reference to an issue.

Issue Description

An error can happen if you try to update a job that was just created. Because of this we have flaky tests. The error is easier to reproduce if you change the job message but it can happen even when it completes fast and the status needs to be changed.

Related issue: #7180

Approach

I actually investigated a lot our code and did a lot of tests regarding write concerns, etc, but I didn't find the root cause. I actually believe the problem is not in our side but it is probably a bug on MongoDB 4.4. The solution I used was just upsert the job status when updating it. Since it fixes the problem and I can't see any downside, I believe it is an acceptable solution.

TODOs before merging

  • Add test cases
  • [ X ] Add entry to changelog
  • Add changes to documentation (guides, repository pages, in-code descriptions)
  • Add security check
  • Add new Parse Error codes to Parse JS SDK
  • ...

@codecov
Copy link

codecov bot commented Feb 16, 2021

Codecov Report

Merging #7196 (07187c0) into master (a430d6f) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #7196   +/-   ##
=======================================
  Coverage   94.02%   94.02%           
=======================================
  Files         172      172           
  Lines       12867    12867           
=======================================
  Hits        12098    12098           
  Misses        769      769           
Impacted Files Coverage Δ
src/Controllers/DatabaseController.js 95.46% <0.00%> (-0.15%) ⬇️
src/Adapters/Files/GridFSBucketAdapter.js 80.32% <0.00%> (+0.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a430d6f...82a84f9. Read the comment docs.

@davimacedo davimacedo mentioned this pull request Feb 16, 2021
4 tasks
Copy link
Member

@mtrezza mtrezza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually believe the problem is not in our side but it is probably a bug on MongoDB 4.4. The solution I used was just upsert the job status when updating it.

LGTM, I think we could find out whether it is a MongoDB 4.4 bug if we just run a loop of saving and immediately updating an obj, bypassing first the Parse.Job logic, then bypassing the DB adapter.

Could there be a scenario in which the upsert creates a job that has already been deleted due to a race conditions? I think that should fail gracefully.

@mtrezza
Copy link
Member

mtrezza commented Feb 16, 2021

Actually, looking at the current run and the error in Postgres, it seems to be the same issue with a different object, so it’s probably not MongoDB related.

Cloud Code cloud jobs should define a job
  Message:
    Unhandled promise rejection: error: relation "_JobStatus" does not exist (line 1180)
  Stack:
    error properties: Object({ length: 108, severity: 'ERROR', code: '42P01', detail: undefined, hint: undefined, position: '8', internalPosition: undefined, internalQuery: undefined, where: undefined, schema: undefined, table: undefined, dataType: undefined, constraint: undefined, file: 'parse_relation.c', routine: 'parserOpenTable' })
        at <Jasmine>
        at Parser.parseErrorMessage (/home/runner/work/parse-server/parse-server/node_modules/pg-protocol/dist/parser.js:278:15)
        at Parser.handlePacket (/home/runner/work/parse-server/parse-server/node_modules/pg-protocol/dist/parser.js:126:29)
        at Parser.parse (/home/runner/work/parse-server/parse-server/node_modules/pg-protocol/dist/parser.js:39:38)
        at Socket.stream.on (/home/runner/work/parse-server/parse-server/node_modules/pg-protocol/dist/index.js:10:42)
        at Socket.emit (events.js:198:13)
        at Socket.EventEmitter.emit (domain.js:448:20)
        at addChunk (_stream_readable.js:288:12)
        at readableAddChunk (_stream_readable.js:269:11)
        at Socket.Readable.push (_stream_readable.js:224:10)
        at TCP.onStreamRead [as onread] (internal/stream_base_commons.js:94:17)

@davimacedo
Copy link
Member Author

davimacedo commented Feb 16, 2021

I actually believe the problem is not in our side but it is probably a bug on MongoDB 4.4. The solution I used was just upsert the job status when updating it.

LGTM, I think we could find out whether it is a MongoDB 4.4 bug if we just run a loop of saving and immediately updating an obj, bypassing first the Parse.Job logic, then bypassing the DB adapter.

Could there be a scenario in which the upsert creates a job that has already been deleted due to a race conditions? I think that should fail gracefully.

I don't think so. I am not aware of any code in place that deletes JobStatus. The developers could maybe write a custom code and delete a JobStatus using the master key. I believe that's not a common scenario though.

I will try what you suggested anyways and I will check different versions of mongo

@davimacedo
Copy link
Member Author

Actually, looking at the current run and the error in Postgres, it seems to be the same issue with a different object, so it’s probably not MongoDB related.

Cloud Code cloud jobs should define a job
  Message:
    Unhandled promise rejection: error: relation "_JobStatus" does not exist (line 1180)
  Stack:
    error properties: Object({ length: 108, severity: 'ERROR', code: '42P01', detail: undefined, hint: undefined, position: '8', internalPosition: undefined, internalQuery: undefined, where: undefined, schema: undefined, table: undefined, dataType: undefined, constraint: undefined, file: 'parse_relation.c', routine: 'parserOpenTable' })
        at <Jasmine>
        at Parser.parseErrorMessage (/home/runner/work/parse-server/parse-server/node_modules/pg-protocol/dist/parser.js:278:15)
        at Parser.handlePacket (/home/runner/work/parse-server/parse-server/node_modules/pg-protocol/dist/parser.js:126:29)
        at Parser.parse (/home/runner/work/parse-server/parse-server/node_modules/pg-protocol/dist/parser.js:39:38)
        at Socket.stream.on (/home/runner/work/parse-server/parse-server/node_modules/pg-protocol/dist/index.js:10:42)
        at Socket.emit (events.js:198:13)
        at Socket.EventEmitter.emit (domain.js:448:20)
        at addChunk (_stream_readable.js:288:12)
        at readableAddChunk (_stream_readable.js:269:11)
        at Socket.Readable.push (_stream_readable.js:224:10)
        at TCP.onStreamRead [as onread] (internal/stream_base_commons.js:94:17)

I beleive this one is another Flaky test. I will take a look at it as well.

@davimacedo
Copy link
Member Author

After a really hard time understanding what was happening it turned out that the problem was in our side. After every test-case, we delete all the data, but, the process of saving the message and status of a job is asynchronous. So it was happening that, in some cases, the test was finishing (and the data getting deleted) before the status and message update.

@mtrezza
Copy link
Member

mtrezza commented Feb 18, 2021

Was the test case not async/await?

it(... async () {
});

I think that would not end the test and delete the data before the test is done. Or even if it is not async/await, the test wouldn't finish before done() is called.

@davimacedo
Copy link
Member Author

davimacedo commented Feb 18, 2021

The tests currently are like this:

it('should define a job', done => {
      expect(() => {
        Parse.Cloud.job('myJob', () => {});
      }).not.toThrow();

      request({
        method: 'POST',
        url: 'http://localhost:8378/1/jobs/myJob',
        headers: {
          'X-Parse-Application-Id': Parse.applicationId,
          'X-Parse-Master-Key': Parse.masterKey,
        },
      }).then(
        () => {
          done();
        },
        err => {
          fail(err);
          done();
        }
      );
    });

Basically they do the api call to start the job and then call the done() function to finish the test. The problem happens because this api only starts the job and leaves it running in background as you can see here. So sometimes the done() function is called and the data is deleted before the job running in background has actually completed.

The changes in this PR makes the tests to wait for the job completition.

@mtrezza
Copy link
Member

mtrezza commented Feb 18, 2021

Makes sense now, thanks. Hats off if that finally fixes these tests. 🤞

@davimacedo davimacedo merged commit 603cc1f into parse-community:master Feb 18, 2021
dplewis pushed a commit that referenced this pull request Feb 21, 2021
* Fix error when a not yet inserted job is updated

* Add entry to changelog

* revert the upsert change and fix the test

* Revert the change so job execute a single time

* Fix other tests with potential similar problem
@parseplatformorg
Copy link
Contributor

🎉 This change has been released in version 5.0.0-beta.1

@parseplatformorg parseplatformorg added the state:released-beta Released as beta version label Nov 1, 2021
@mtrezza mtrezza mentioned this pull request Mar 12, 2022
@parseplatformorg
Copy link
Contributor

🎉 This change has been released in version 5.0.0

@parseplatformorg parseplatformorg added the state:released Released as stable version label Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state:released Released as stable version state:released-beta Released as beta version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants