-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Restrict importing URLs which are not allowlisted (by domain) for the API key #511
Conversation
This PR will trigger a minor release when merged. |
src/controllers/import.js
Outdated
/** | ||
* Check if the URLs in urlList belong to any of the base domains. | ||
* @param urlList | ||
* @param baseDomainList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for the @param tags can be improved for clarity, for example:
* Check if all URLs in the list belong to any of the allowed base domains.
* @param {string[]} urlList - List of URLs to check.
* @param {string[]} baseDomainList - List of allowed base domains.
Same for the other functions.
src/controllers/import.js
Outdated
log.error(`Missing imports.write scope for creating a new import job: ${error.message}`); | ||
// Check if the user has the write_all_domains API scope. | ||
// This scope is restricted to a few users. | ||
validateAccessScopes([SCOPE.WRITE_ALL_DOMAINS]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be validating that the user has this scope here. Looking at the log trace it would seem odd that we got an error, yet continued.
I think we should actually rebrand this scope from WRITE_ALL_DOMAINS to ALL_DOMAINS. Then users who have elevated powers should have both the WRITE + ALL_DOMAINS scopes. The initial scope validation done for SCOPE.WRITE on line 201 makes sense to me answering the question does the user have the privilege to create jobs, if so continue.
Then later on, validateUrlsByDomain shouldn't care about the scopes. The code could be wrapped with a scope check,
if (!validateAccsesScope([SCOPE.WRITE_ALL_DOMAINS]) {
validateUrlsByDomain(urls);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method validateAccessScope
throws an error in case the scopes do not match. So we might need to use this check instead:
if (!scopes.some((scope) => scope.name === SCOPE.WRITE_ALL_DOMAINS)) {
validateUrlsByDomain(urls);
}
src/controllers/import.js
Outdated
if (scopes.some((scope) => scope.name === SCOPE.WRITE_ALL_DOMAINS)) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably stick to using the validateAccsesScope to tell us if the user has the scope. See my comment below about possibly removing scopes from this function all together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhellema - validateAccessScope
is just checking if the user has the given scope.
The scopes information (with the name and domains) is loaded to the profile later.
So, we cannot check if the scope has the required domains in that function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a function validateScopeDomainsIsNotEmpty
- to check if the domain list is not empty for a given input scope?
We will invoke it for APIs requiring scopes imports.read
and imports.write
. We will not need to invoke it for API that require scopes imports.read_all
and for imports.write_all
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option could be returning the profile information in validateAccessScope
-> const { authInfo: { profile } } = attributes;
In that case we could add the check for empty domain lists for scopes with read
/ write
access.
src/controllers/import.js
Outdated
* @param inputUrl | ||
* @return domain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flush out missing jsdocs
src/controllers/import.js
Outdated
* @param urlList | ||
* @param baseDomainList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flush out missing jsdocs
src/controllers/import.js
Outdated
if (!belongsToBaseDomain) { | ||
throw new ErrorWithStatusCode(`Invalid request: URL: ${inputUrl} not allowed`, STATUS_BAD_REQUEST); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's collect all the invalid URLs then throw the error if the list has a length > 0. That way the user isn't going back and forth potentially. We should be able to tell them the potential list of invalid URLs they provided.
src/controllers/import.js
Outdated
* @param urls | ||
* @param scopes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flush out missing jsdocs
src/controllers/import.js
Outdated
if (!scope.domains || scope.domains.length === 0) { | ||
throw new ErrorWithStatusCode('Missing domain information', STATUS_UNAUTHORIZED); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check should be done as part of validateAccessScopes. If we have a scope of SCOPE.WRITE (and does not have _WRITE__ALL_DOMAINS) then verify that the domains are set.
src/controllers/import.js
Outdated
@@ -156,6 +222,10 @@ function ImportController(context) { | |||
urls, options, importScript, customHeaders, | |||
} = multipartFormData; | |||
|
|||
validateUrlsByDomain(urls, profile.getScopes()); | |||
|
|||
log.debug(`Creating a new import job with ${urls.length} URLs.`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason this is a debug-level log and not info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to that Shikha. Having the logs tell us the story in Coralogix is critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could modify it to an info log. I kept it as debug, to not pollute the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs are critical to the tracing story. It's critical that we are able to track the progress of the job. It's absolutely okay to be logging information like this, and I would encourage more of it so that we can clearly see what's happening.
|
||
// We only check if the URLs belong to the allowed domains if the user has the write scope | ||
// We do not need to check the domains for users with scope: all_domains | ||
if (!scopes.some((scope) => scope.name === SCOPE.ALL_DOMAINS)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use flatMap.
if (!scopes.some(scope => scope.name === SCOPE.ALL_DOMAINS)) {
const allowedDomains = scopes
.filter(scope => scope.name === SCOPE.WRITE && scope.domains && scope.domains.length > 0)
.flatMap(scope => scope.domains.map(getDomain));
if (allowedDomains.length === 0) {
throw new ErrorWithStatusCode('Missing domain information', STATUS_UNAUTHORIZED);
}
isUrlInBaseDomains(urls, allowedDomains);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I'll add that in!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little helper function to perform this check might improve readability, such as:
if (!scopes.some((scope) => scope.name === SCOPE.ALL_DOMAINS)) { | |
if (hasAllDomainsScope(scopes)) { |
# [1.67.0](v1.66.3...v1.67.0) (2024-09-27) ### Features * Restrict importing URLs which are not allowlisted (by domain) for the API key ([#511](#511)) ([f8c7a84](f8c7a84))
🎉 This issue has been resolved in version 1.67.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
const scopes = profile.getScopes(); | ||
|
||
// We only check if the URLs belong to the allowed domains if the user has the write scope | ||
// We do not need to check the domains for users with scope: all_domains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is no check for the "write" scope for all_domains, then what happens if the user only has the ALL_DOMAINS scope and not the WRITE scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoot, looks like I just missed getting my comments in. Few minor items to consider.
@@ -77,6 +77,7 @@ | |||
"@aws-sdk/s3-request-presigner": "3.654.0", | |||
"@slack/bolt": "3.21.4", | |||
"busboy": "1.6.0", | |||
"psl": "^1.9.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the ^ for consistency with other dependencies.
* @return {true} if all URLs belong to an allowed base domain | ||
* @throws {ErrorWithStatusCode} if any URL does not belong to an allowed base domain | ||
*/ | ||
function isUrlInBaseDomains(urlList, baseDomainList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: isUrlInAllowedDomains
}); | ||
|
||
if (invalidUrls.length > 0) { | ||
throw new ErrorWithStatusCode(`Invalid request: URLs not allowed: ${invalidUrls.join(', ')}`, STATUS_BAD_REQUEST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could lead to a very large x-error
header, which might lead to other issues. Perhaps just include the first invalid URL, as an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this change after initial reviews - we wanted to prevent back and forth with the customer if there are multiple URLs which are invalid. But, I could change it to the first URL.
|
||
// We only check if the URLs belong to the allowed domains if the user has the write scope | ||
// We do not need to check the domains for users with scope: all_domains | ||
if (!scopes.some((scope) => scope.name === SCOPE.ALL_DOMAINS)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little helper function to perform this check might improve readability, such as:
if (!scopes.some((scope) => scope.name === SCOPE.ALL_DOMAINS)) { | |
if (hasAllDomainsScope(scopes)) { |
}); | ||
|
||
it('should create an import job for the user scope imports.write', async () => { | ||
baseContext.attributes.authInfo.profile.getScopes = () => [{ name: 'imports.write', domains: ['https://www.example.com'] }, { name: 'imports.read', domains: ['https://www.example.com'] }]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ name: 'imports.read', domains: ['https://www.example.com']
Is domains
needed/supported on imports.read
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we do not need it. I added this unit test to test that the code picks up domains from only WRITE
scope and not READ
.
Related Issues
SITES-22756
An API key owner has to specifically allowlist domains before they can be processed by Import as a Service.
We do not want to allow imports on domains which are not part of the allowlist for a given user.