Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tarfile filters (PEP 706) when unpacking? #12111

Closed
1 task done
encukou opened this issue Jun 27, 2023 · 3 comments · Fixed by #12214
Closed
1 task done

Use tarfile filters (PEP 706) when unpacking? #12111

encukou opened this issue Jun 27, 2023 · 3 comments · Fixed by #12214
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior

Comments

@encukou
Copy link
Contributor

encukou commented Jun 27, 2023

Description

Hello,
This is a continuation of #11464, which unfortunately turned sour.

New security releases of Python add a filter argument to tarfile.extract, which allows filtering common security issues.
Python 3.12 will raise a DeprecationWarning if filter is not specified. I assume pip will want to avoid the warning.
Python 3.14 will change the default to tarfile.data_filter.

What are your thoughts on how to best handle this? Happy to send a PR after a discussion.

Complicating factors:

  • pip has its own elaborate code that's similar to data_filter, with different details that might need to be preserved
  • for symlinks, pip reaches into CPython internals (tar._extract_member), which may avoid both the filtering and the warning.

Possible solutions I see:

  • Use data_filter on Pythons that support it. Accept behaviour differences -- AFAIK the biggest one is in mode handling:
    • data_filter sets rwx?-??-? if executable or rw-?--?-- if not (?=keep info from tarball)
    • pip uses the OS default, then and adds all 3 executable bits if any are set in the tarball
  • Use data_filter if available to raise exceptions (files/links outside target, device files), use existing mechanism if the filter doesn't fail
  • Use the fully_trusted filter to silence warnings

In the previous issue, @pradyunsg said:

Our security model is that sdists involve arbitrary code execution, so they can do basically anything in the build process.

My point was that I think we should (and do) protect against certain classes of issues though, such as dumping files in arbitrary locations.

The latter is currently not true: pip's unpacking is vulnerable to symlink attacks. The attached reproducer makes pip download evil.tar.gz dump a file in /tmp/poc, without running arbitrary code.

As said before, given pip's security model this isn't a security issue -- AFAIK it could only become one if the function is copied/used as documented elsewhere.

Expected behavior

No response

pip version

main

Python version

3.12

OS

*nix, any

How to Reproduce

mkevil.py:

import tarfile

def mkinfo(name, **kwargs):
    tarinfo = tarfile.TarInfo(name=name)
    for name, value in kwargs.items():
        setattr(tarinfo, name, value)
    return tarinfo

with tarfile.open('evil.tar.gz', 'w:gz') as tf:
    tf.addfile(mkinfo('./pyproject.toml'))
    tf.addfile(mkinfo('./tmp', type=tarfile.SYMTYPE,
                      linkname='../../../../../../../../tmp'))
    tf.addfile(mkinfo('./tmp/poc'))
python mkevil.py
python -m pip download evil.tar.gz
ls ls /tmp/poc

Output

$ python mkevil.py 

$ python -m pip download evil.tar.gz
Looking in indexes: http://127.0.0.1:3141/root/pypi/+simple
Processing ./evil.tar.gz
  File was already downloaded /home/pviktori/dev/one-offs/pocs/tarfile-pip/evil.tar.gz
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Successfully downloaded UNKNOWN

$ ls /tmp/poc  # created by `pip download`
/tmp/poc

Code of Conduct

@encukou encukou added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Jun 27, 2023
@pfmoore
Copy link
Member

pfmoore commented Jun 27, 2023

+1 on ensuring that when unpacking tarfiles, pip never either creates files outside of the target directory, or creates links to files outside of the target directory. I don't believe there is any valid form of sdist that needs the capability to do this (and if there is, I'd support a PEP modifying the sdist spec to make it illegal 🙂)

I'd be slightly uncomfortable if we had to make such protection only available in some Python versions, but I could live with it.

I have no opinion regarding differences in mode handling. As a Windows user I don't think it affects me, and in general, I don't think we should be trying to preserve mode bits anyway. I suspect someone, somewhere, will have a sdist that expects a particular mode bit to be preserved from source tree, through sdist and wheel, to final install (unless some other part of the chain already breaks this!) Again, I'd be happy to support a PEP that made such expectations illegal, though 😉

This latter is my biggest concern. The sdist format is hugely under-specified, and people could easily be using it in ways we don't know about and wouldn't necessarily sanction. There's a risk that we could end up needing to write a PEP to tighten up the standards (pip's policy is that we implement standard-defined behaviour, so we wouldn't implement significant restrictions unilaterally).

@encukou
Copy link
Contributor Author

encukou commented Jun 29, 2023

I'd be slightly uncomfortable if we had to make such protection only available in some Python versions, but I could live with it.

It's now in the latest security releases of all supported Python versions.

unless some other part of the chain already breaks this!

It does: Python's zipfile ignores mode on extraction. (So do non-Windows filesystems and git.)
So I guess the worry would be that the wheel-generating code of a Unix-only project writes file contents differently depending on file modes.


Guess I should mention another potential incompatibility: historically tarfile honors owner user/group if run as the superuser, pip and data_filter ignore it.

@encukou
Copy link
Contributor Author

encukou commented Jul 4, 2023

The sdist format is hugely under-specified, and people could easily be using it in ways we don't know about and wouldn't necessarily sanction. There's a risk that we could end up needing to write a PEP to tighten up the standards (pip's policy is that we implement standard-defined behaviour, so we wouldn't implement significant restrictions unilaterally).

Here's the PEP proposal: https://discuss.python.org/t/using-tarfile-data-filter-for-source-distribution-extraction/28928

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants