-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: SITES-23133 [Importer] Report on the progress of a import job via a new jobs endpoint #508
Conversation
…a a new jobs endpoint
…a a new jobs endpoint
…a a new jobs endpoint * add write all domains scope
…a a new jobs endpoint
# Conflicts: # docs/index.html
…a a new jobs endpoint
…a a new jobs endpoint
…a a new jobs endpoint
…a a new jobs endpoint
This PR will trigger a patch release when merged. |
…a a new jobs endpoint
src/controllers/import.js
Outdated
* - failed: URLs that have failed to import. | ||
* - pending: URLs that are still being processed. | ||
* - redirected: URLs that have been redirected. | ||
* - running - URLs that are currently being imported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will running
ever be non-zero? Since we do not have access to the data layer from the content-scraper, we do not update the DB when we begin processing a URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is interesting. We could do two things I guess with this.
- remove running as it will never be non zero.
- implement a new feature that will allow the content-scraper to message back / use a lamda to update the status of the import url job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there value in constantly updating this field?
- Currently, it will always be
0
or1
since we have configured only a single Lambda invocation for the import queues, so effectively sequential processing - The cost of additional writes to the DB each time a new URL begins processing will add up for larger jobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is value at this point updating the field, at this point. I think at this point I should just eliminate the concept of running from the api-service. However if we start running jobs in parallel, then I think it might be interesting to revisit this.
let updatedOptions = { ...options }; | ||
|
||
if (importScript) { | ||
updatedOptions = { ...updatedOptions, hasCustomImportJs: true }; | ||
} | ||
|
||
if (customHeaders) { | ||
updatedOptions = { ...updatedOptions, hasCustomHeaders: true }; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we expose these values for consumption both by API clients, as well as for use by our dashboards? cc @swetabar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will have to provide new columns in the job for these values. We can log an new issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SITES-25634
…a a new jobs endpoint
…a a new jobs endpoint
READ: 'imports.read', // allows users to read the import jobs created with their API key | ||
WRITE: 'imports.write', // allows users to create new import jobs | ||
READ_ALL: 'imports.read_all', // allows users to view all import jobs | ||
WRITE_ALL_DOMAINS: 'imports.write_all_domains', // allows users to import across any domain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include this new scope in spacecat-shared as well? I'm planning to add it with my PR for https://jira.corp.adobe.com/browse/SITES-22756
# [1.66.0](v1.65.12...v1.66.0) (2024-09-24) ### Features * SITES-23133 [Importer] Report on the progress of a import job via a new jobs endpoint ([#508](#508)) ([78ef726](78ef726))
🎉 This issue has been resolved in version 1.66.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
When querying the job progress via the endpoint eg. /tools/import/jobs/e551798c-eb17-49bc-beb3-4e3737734db4/progress
The result over time will look something like this:
When getting the job status and the processor will also add the redirectCount to the result: