Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] The fact that multiprocess.Queue uses serialization should be documented. #73159

Closed
Bernhard10 mannequin opened this issue Dec 14, 2016 · 10 comments
Closed

[doc] The fact that multiprocess.Queue uses serialization should be documented. #73159

Bernhard10 mannequin opened this issue Dec 14, 2016 · 10 comments
Assignees
Labels
docs Documentation in the Doc dir easy topic-multiprocessing

Comments

@Bernhard10
Copy link
Mannequin

Bernhard10 mannequin commented Dec 14, 2016

BPO 28973
Nosy @bitdancer, @applio, @Bernhard10, @iritkatriel
Files
  • mwe.py: Minimal working example to reproduce this bug/ surprising behaviour.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2016-12-14.14:38:29.506>
    labels = ['easy', '3.9', '3.10', '3.11', 'type-feature', 'docs']
    title = '[doc] The fact that multiprocess.Queue uses serialization should be documented.'
    updated_at = <Date 2021-08-07.13:17:33.028>
    user = 'https://github.com/Bernhard10'

    bugs.python.org fields:

    activity = <Date 2021-08-07.13:17:33.028>
    actor = 'r.david.murray'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation']
    creation = <Date 2016-12-14.14:38:29.506>
    creator = 'Bernhard10'
    dependencies = []
    files = ['45899']
    hgrepos = []
    issue_num = 28973
    keywords = ['easy']
    message_count = 8.0
    messages = ['283192', '283193', '283195', '283198', '283199', '283206', '398809', '399179']
    nosy_count = 5.0
    nosy_names = ['r.david.murray', 'docs@python', 'davin', 'Bernhard10', 'iritkatriel']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue28973'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    Linked PRs

    @Bernhard10
    Copy link
    Mannequin Author

    Bernhard10 mannequin commented Dec 14, 2016

    When I did some tests involving unittest.mock.sentinel and multiprocessing.Queue, I noticed that multiprocessing.Queue changes the id of the sentinel.

    This behaviour is definitely surprising and not documented.

    @Bernhard10 Bernhard10 mannequin added the type-bug An unexpected behavior, bug, or error label Dec 14, 2016
    @Bernhard10
    Copy link
    Mannequin Author

    Bernhard10 mannequin commented Dec 14, 2016

    See http://stackoverflow.com/a/925241/5069869

    Apparently multiprocessing.Queue uses pickle to serialize the objects in the queue, which explains the change of identity, but is absolutely unclear from the documentation.

    @Bernhard10 Bernhard10 mannequin added the docs Documentation in the Doc dir label Dec 14, 2016
    @Bernhard10 Bernhard10 mannequin changed the title multiprocess.Queue changes objects identity The fact that multiprocess.Queue uses serialization should be documented. Dec 14, 2016
    @Bernhard10 Bernhard10 mannequin assigned docspython Dec 14, 2016
    @bitdancer
    Copy link
    Member

    That fact that this is so is implicit in the name multi*process*ing and
    the documented restrictions of the id function. That is, it is the purpose of the module is to manage computation across multiple processes. Since different processes have distinct memory spaces, you cannot depend on object identity between processes, by the definition of object identity (it is constant only for the lifetime of the object in memory, and the different processes have different memory spaces, therefore the object id may be different in the different processes). By construction this applies also to any multiprocessing mechanism that is used to transmit objects, even if the transmission turns out to be to the same process in a particular case. You can't *depend* on the id in that case, because the transmission mechanism must be free to change the object identity in order to work in the general case.

    Should we document this explicitly? Perhaps so. Maybe in the multiprocessing introduction?

    @Bernhard10
    Copy link
    Mannequin Author

    Bernhard10 mannequin commented Dec 14, 2016

    My first thought was that Queue was implemented using shared memory.
    I guess from the fact that the "Shared memory" section is separate in the multiprocessing documentation I should have known better, though.

    So I guess some clarification in the documentation would be helpful.

    @bitdancer
    Copy link
    Member

    Yeah, that's why I said "in the general case". Making it clear in the overview seems reasonable to me.

    @applio
    Copy link
    Member

    applio commented Dec 14, 2016

    All communication between processes in multiprocessing has consistently used pickle to serialize the data being communicated (this includes what is described in the "Shared memory" section of the docs). The documentation has not done a great job of making this clear, instead only describing the requirement that data be pickleable in select places. For example, in the section on Queues:
    Note: When an object is put on a queue, the object is pickled and a
    background thread later flushes the pickled data to an underlying pipe.

    Though it only applies to 3.6+, bpo-28053 still needs its own documentation improvement to make clear that the mechanism for communicating data defaults to serialization by pickle but that this can be replaced by alternatives.

    I agree that the documentation around the use of pickle in multiprocessing deserves improvement.

    @applio applio added 3.7 (EOL) end of life type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Dec 14, 2016
    @iritkatriel
    Copy link
    Member

    There is a note mentioning pickle in this section: https://docs.python.org/3/library/multiprocessing.html#pipes-and-queues

    It starts with "When an object is put on a queue, the object is pickled and..."

    A comment about the object ids can be added there.

    @iritkatriel iritkatriel added easy 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes and removed 3.7 (EOL) end of life labels Aug 2, 2021
    @iritkatriel iritkatriel changed the title The fact that multiprocess.Queue uses serialization should be documented. [doc] The fact that multiprocess.Queue uses serialization should be documented. Aug 2, 2021
    @bitdancer
    Copy link
    Member

    Mentioning ids would be pretty much redundant with mentioning pickle. If it is pickled its id is going to change. I think Davin was suggesting that while the use of serialization is documented, it is not documented *consistently*. Everywhere serialization happens it should be mentioned in the docs.

    Regardless, a proposed doc PR is the way forward here.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @digitalfotografen
    Copy link
    Contributor

    I am starting a doc PR on this during this weekends EuroPython Sprints and plan to have it finished by this weekend

    @gpshead gpshead self-assigned this Jul 13, 2024
    @gpshead gpshead removed 3.11 only security fixes 3.10 only security fixes 3.9 only security fixes type-feature A feature request or enhancement labels Jul 13, 2024
    gpshead pushed a commit that referenced this issue Jul 13, 2024
    … are pickled. (GH-121686)
    
    Added explicit comments about that objects are pickled when transmitted via multiprocessing queues and pipes.
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 13, 2024
    …bjects are pickled. (pythonGH-121686)
    
    Added explicit comments about that objects are pickled when transmitted via multiprocessing queues and pipes.
    (cherry picked from commit b580589)
    
    Co-authored-by: Ulrik Södergren <[email protected]>
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 13, 2024
    …bjects are pickled. (pythonGH-121686)
    
    Added explicit comments about that objects are pickled when transmitted via multiprocessing queues and pipes.
    (cherry picked from commit b580589)
    
    Co-authored-by: Ulrik Södergren <[email protected]>
    @gpshead
    Copy link
    Member

    gpshead commented Jul 13, 2024

    thanks! there might be seen as adding some redundancy in the multiprocessing docs, but that's a larger "should this doc be rewritten from a different perspective?" kind of issue. This new text at least gets the important information stated directly, in the places I expect readers are likely to see it. Rather than inferring it from other notes in the docs.

    @gpshead gpshead closed this as completed Jul 13, 2024
    gpshead pushed a commit that referenced this issue Jul 13, 2024
    …objects are pickled. (GH-121686) (#121728)
    
    gh-73159 Added clarifications in multiprocessing docs on that objects are pickled. (GH-121686)
    
    Added explicit comments about that objects are pickled when transmitted via multiprocessing queues and pipes.
    (cherry picked from commit b580589)
    
    Co-authored-by: Ulrik Södergren <[email protected]>
    gpshead pushed a commit that referenced this issue Jul 13, 2024
    …objects are pickled. (GH-121686) (#121727)
    
    gh-73159 Added clarifications in multiprocessing docs on that objects are pickled. (GH-121686)
    
    Added explicit comments about that objects are pickled when transmitted via multiprocessing queues and pipes.
    (cherry picked from commit b580589)
    
    Co-authored-by: Ulrik Södergren <[email protected]>
    estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
    …bjects are pickled. (pythonGH-121686)
    
    Added explicit comments about that objects are pickled when transmitted via multiprocessing queues and pipes.
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir easy topic-multiprocessing
    Projects
    Development

    No branches or pull requests

    6 participants