-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential false positive for "Uncontrolled data used in path expression" alert #17226
Comments
Update: from pathlib import Path
from os.path import normpath
def validate_file_path(file: Path) -> Path:
"""
Validates the input file paths used for the workflow, making sure that they are
safe to be executed and passed on to subsequent stages of the workflow. Returns
a validated, fully resolved version of the file path.
"""
file = Path(file) if isinstance(file, str) else file
file = file.resolve() # Get full path for inspection
# Validate file types
if not file.exists():
raise Exception(f"{file!r} doesn't exist")
if not file.is_file():
raise Exception(f"{file!r} is not a file")
if file.suffix not in valid_file_types:
raise Exception(f"{file!r} is not a valid file type")
# Try validating with os.path functions
full_path = normpath(str(file))
# Use path to storage location as reference to verify basepath is correct
base_path = normpath("/path/to/where/data/should/be/stored")
if str(full_path).startswith(str(base_path)):
return Path(full_path).resolve()
else:
raise Exception(f"{file!r} points to a directory that is not permitted")
# The code block then becomes:
incoming_file = validate_file(file)
"""
Code block to register file in the database goes here
"""
return True As per previous, would greatly appreciate your advice on whether this can be considered a false positive, or if I should further validate the file path. Thanks! |
I've made another attempt at resolving this error, with my validation + sanitisation function looking like this now: import re
from pathlib import Path
def validate_and_sanitise(file: Path) -> Path:
"""
Validates the input file paths used for the CLEM workflow, making sure that they
are safe to be executed and passed on to subsequent stages of the workflow. Returns
a validated, sanitised version of the file path.
"""
file = Path(file) if isinstance(file, str) else file
# Convert to Posix str and check that no forbidden characters are therre
full_path = file.resolve().as_posix()
if bool(re.fullmatch(r"^[\w\s\.\-/]+$", full_path)) is False:
raise ValueError(f"Unallowed characters present in {file!r}")
# Check that it's not accessing somehwere it's not allowed
base_path = Path("/path/to/where/data/should/be/stored")
if not str(full_path).startswith(str(list(base_path.parents)[-3])):
raise ValueError(f"{file!r} points to a directory that is not permitted")
valid_file = Path(full_path)
# Additional checks for file path given
if not valid_file.is_file():
raise ValueError(f"{file!r} is not a file")
if not valid_file.exists():
raise FileNotFoundError(f"{file!r} doesn't exist")
if valid_file.suffix not in valid_file_types:
raise ValueError(f"{file!r} is not a valid file format")
return valid_file
# How it's used in the database function
def get_db_entry(...):
# Validate file path if provided
if file_path is not None:
try:
file_path = validate_and_sanitise(file_path)
except Exception:
raise Exception
"""
Register file with database here
"""
return db_entry Unfortunately, this attempt also still fails to satisfy the CodeQL warning, so as previous, advice would be appreciated on how I could improve this check if it is indeed not a false positive. |
Just another update on this situation: I've finally been able to write a version of the sanitisation check that satisfies the CodeQL alert: import re
from os import path
from pathlib import Path
def validate_and_sanitise(file: Path) -> Path:
"""
Performs validation and sanitisation on the incoming file paths, ensuring that
no forbidden characters are present and that the the path points only to allowed
sections of the file server.
Returns the file path as a sanitised string that can be converted into a Path
object again.
"""
full_path = path.normpath(path.realpath(file))
if bool(re.fullmatch(r"^[\w\s\.\-/]+$", full_path)) is False:
raise ValueError(f"Unallowed characters present in {file!r}")
# Check that it's not accessing somehwere it's not allowed
storage_path = Path("/path/to/where/data/is/stored")
base_path = list(storage_path.parents)[-3].as_posix()
if not str(full_path).startswith(str(base_path)):
raise ValueError(f"{file!r} points to a directory that is not permitted")
# Additional checks
if f".{full_path.rsplit('.', 1)[-1]}" not in valid_file_types:
raise ValueError("File is not a permitted file format")
return Path(full_path)
# The rest of the function proceeds as previous
def get_db_entry(...):
# Validate file path if provided
if file_path is not None:
try:
file_path = validate_and_sanitise(file_path)
except Exception:
raise Exception
"""
Register file with database here
"""
return db_entry Trying to resolve this issue revealed some interesting things about what CodeQL flags as an alert:
I've resolved this alert for my project now, but it might be worth considering discussing supporting the use of Looking forward to hearing your team's thoughts on this matter going forward. Thanks! |
Description of the false positive
I'm writing functions to add files to an SQL database, and CodeQL has flagged that the file paths are a potential security risk.
I have constructed a basic validator to make sure that only file paths that exist and start with a given base path when resolved are accepted for subsequent processing. However, CodeQL still views this as being insufficient.
Is this a false positive, or can my validation check be further enhanced? Note that I use
resolve
so that I can get and compare the start of the file paths.Code samples or links to source code
URL to the alert on GitHub code scanning (optional)
https://github.com/DiamondLightSource/python-murfey/pull/321/checks?check_run_id=28770940998
If this isn't a false positive, and CodeQL is working as intended, advice on mitigating the security issue in this context would be much appreciated. Thanks!
The text was updated successfully, but these errors were encountered: