Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Restrict importing URLs which are not allowlisted (by domain) for the API key #511

Merged
merged 15 commits into from
Sep 27, 2024

Conversation

swetabar
Copy link
Contributor

Related Issues

SITES-22756

An API key owner has to specifically allowlist domains before they can be processed by Import as a Service.
We do not want to allow imports on domains which are not part of the allowlist for a given user.

@swetabar swetabar added the enhancement New feature or request label Sep 25, 2024
@swetabar swetabar self-assigned this Sep 25, 2024
Copy link

This PR will trigger a minor release when merged.

/**
* Check if the URLs in urlList belong to any of the base domains.
* @param urlList
* @param baseDomainList
Copy link

@codevoyage codevoyage Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for the @param tags can be improved for clarity, for example:

 * Check if all URLs in the list belong to any of the allowed base domains.
 * @param {string[]} urlList - List of URLs to check.
 * @param {string[]} baseDomainList - List of allowed base domains.

Same for the other functions.

log.error(`Missing imports.write scope for creating a new import job: ${error.message}`);
// Check if the user has the write_all_domains API scope.
// This scope is restricted to a few users.
validateAccessScopes([SCOPE.WRITE_ALL_DOMAINS]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be validating that the user has this scope here. Looking at the log trace it would seem odd that we got an error, yet continued.

I think we should actually rebrand this scope from WRITE_ALL_DOMAINS to ALL_DOMAINS. Then users who have elevated powers should have both the WRITE + ALL_DOMAINS scopes. The initial scope validation done for SCOPE.WRITE on line 201 makes sense to me answering the question does the user have the privilege to create jobs, if so continue.

Then later on, validateUrlsByDomain shouldn't care about the scopes. The code could be wrapped with a scope check,

if (!validateAccsesScope([SCOPE.WRITE_ALL_DOMAINS]) {
   validateUrlsByDomain(urls);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method validateAccessScope throws an error in case the scopes do not match. So we might need to use this check instead:

if (!scopes.some((scope) => scope.name === SCOPE.WRITE_ALL_DOMAINS)) {
   validateUrlsByDomain(urls);
}

Comment on lines 166 to 168
if (scopes.some((scope) => scope.name === SCOPE.WRITE_ALL_DOMAINS)) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably stick to using the validateAccsesScope to tell us if the user has the scope. See my comment below about possibly removing scopes from this function all together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhellema - validateAccessScope is just checking if the user has the given scope.
The scopes information (with the name and domains) is loaded to the profile later.
So, we cannot check if the scope has the required domains in that function.

Copy link
Contributor Author

@swetabar swetabar Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a function validateScopeDomainsIsNotEmpty - to check if the domain list is not empty for a given input scope?
We will invoke it for APIs requiring scopes imports.read and imports.write. We will not need to invoke it for API that require scopes imports.read_all and for imports.write_all.

Copy link
Contributor Author

@swetabar swetabar Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option could be returning the profile information in validateAccessScope -> const { authInfo: { profile } } = attributes;

In that case we could add the check for empty domain lists for scopes with read / write access.

Comment on lines 132 to 133
* @param inputUrl
* @return domain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flush out missing jsdocs

Comment on lines 143 to 144
* @param urlList
* @param baseDomainList
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flush out missing jsdocs

Comment on lines 152 to 154
if (!belongsToBaseDomain) {
throw new ErrorWithStatusCode(`Invalid request: URL: ${inputUrl} not allowed`, STATUS_BAD_REQUEST);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's collect all the invalid URLs then throw the error if the list has a length > 0. That way the user isn't going back and forth potentially. We should be able to tell them the potential list of invalid URLs they provided.

Comment on lines 161 to 162
* @param urls
* @param scopes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flush out missing jsdocs

Comment on lines 173 to 175
if (!scope.domains || scope.domains.length === 0) {
throw new ErrorWithStatusCode('Missing domain information', STATUS_UNAUTHORIZED);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should be done as part of validateAccessScopes. If we have a scope of SCOPE.WRITE (and does not have _WRITE__ALL_DOMAINS) then verify that the domains are set.

@@ -156,6 +222,10 @@ function ImportController(context) {
urls, options, importScript, customHeaders,
} = multipartFormData;

validateUrlsByDomain(urls, profile.getScopes());

log.debug(`Creating a new import job with ${urls.length} URLs.`);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason this is a debug-level log and not info?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to that Shikha. Having the logs tell us the story in Coralogix is critical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could modify it to an info log. I kept it as debug, to not pollute the logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs are critical to the tracing story. It's critical that we are able to track the progress of the job. It's absolutely okay to be logging information like this, and I would encourage more of it so that we can clearly see what's happening.


// We only check if the URLs belong to the allowed domains if the user has the write scope
// We do not need to check the domains for users with scope: all_domains
if (!scopes.some((scope) => scope.name === SCOPE.ALL_DOMAINS)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use flatMap.

if (!scopes.some(scope => scope.name === SCOPE.ALL_DOMAINS)) {
  const allowedDomains = scopes
    .filter(scope => scope.name === SCOPE.WRITE && scope.domains && scope.domains.length > 0)
    .flatMap(scope => scope.domains.map(getDomain));

  if (allowedDomains.length === 0) {
    throw new ErrorWithStatusCode('Missing domain information', STATUS_UNAUTHORIZED);
  }

  isUrlInBaseDomains(urls, allowedDomains);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'll add that in!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little helper function to perform this check might improve readability, such as:

Suggested change
if (!scopes.some((scope) => scope.name === SCOPE.ALL_DOMAINS)) {
if (hasAllDomainsScope(scopes)) {

@swetabar swetabar merged commit f8c7a84 into main Sep 27, 2024
4 checks passed
@swetabar swetabar deleted the SITES-22756-restrict-importing-urls branch September 27, 2024 15:55
solaris007 pushed a commit that referenced this pull request Sep 27, 2024
# [1.67.0](v1.66.3...v1.67.0) (2024-09-27)

### Features

* Restrict importing URLs which are not allowlisted (by domain) for the API key ([#511](#511)) ([f8c7a84](f8c7a84))
@solaris007
Copy link
Member

🎉 This issue has been resolved in version 1.67.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

const scopes = profile.getScopes();

// We only check if the URLs belong to the allowed domains if the user has the write scope
// We do not need to check the domains for users with scope: all_domains

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no check for the "write" scope for all_domains, then what happens if the user only has the ALL_DOMAINS scope and not the WRITE scope?

Copy link
Contributor

@blefebvre blefebvre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoot, looks like I just missed getting my comments in. Few minor items to consider.

@@ -77,6 +77,7 @@
"@aws-sdk/s3-request-presigner": "3.654.0",
"@slack/bolt": "3.21.4",
"busboy": "1.6.0",
"psl": "^1.9.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the ^ for consistency with other dependencies.

* @return {true} if all URLs belong to an allowed base domain
* @throws {ErrorWithStatusCode} if any URL does not belong to an allowed base domain
*/
function isUrlInBaseDomains(urlList, baseDomainList) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: isUrlInAllowedDomains

});

if (invalidUrls.length > 0) {
throw new ErrorWithStatusCode(`Invalid request: URLs not allowed: ${invalidUrls.join(', ')}`, STATUS_BAD_REQUEST);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could lead to a very large x-error header, which might lead to other issues. Perhaps just include the first invalid URL, as an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this change after initial reviews - we wanted to prevent back and forth with the customer if there are multiple URLs which are invalid. But, I could change it to the first URL.


// We only check if the URLs belong to the allowed domains if the user has the write scope
// We do not need to check the domains for users with scope: all_domains
if (!scopes.some((scope) => scope.name === SCOPE.ALL_DOMAINS)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little helper function to perform this check might improve readability, such as:

Suggested change
if (!scopes.some((scope) => scope.name === SCOPE.ALL_DOMAINS)) {
if (hasAllDomainsScope(scopes)) {

});

it('should create an import job for the user scope imports.write', async () => {
baseContext.attributes.authInfo.profile.getScopes = () => [{ name: 'imports.write', domains: ['https://www.example.com'] }, { name: 'imports.read', domains: ['https://www.example.com'] }];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ name: 'imports.read', domains: ['https://www.example.com']

Is domains needed/supported on imports.read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we do not need it. I added this unit test to test that the code picks up domains from only WRITE scope and not READ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants