Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown is not always converted to python in multiprocessing/threading #9134

Closed
cameronraysmith opened this issue Mar 19, 2024 · 10 comments · Fixed by #10536
Closed

Markdown is not always converted to python in multiprocessing/threading #9134

cameronraysmith opened this issue Mar 19, 2024 · 10 comments · Fixed by #10536
Assignees
Labels
bug Something isn't working jupyter
Milestone

Comments

@cameronraysmith
Copy link

cameronraysmith commented Mar 19, 2024

See quarto-dev/quarto#401 regarding the impact of the same issue on the Quarto VS Code extension.

Bug description

I recognize this is not an ideal design pattern, but calling a third-party library that uses multiprocessing and threading as shown in the minimal example below, results in an error suggesting the markdown may be passed to the process without being converted to python. The source code in the example works as expected in a python interpreter or jupyter kernel.

Steps to reproduce

near-minimal example

---
title: "Markdown is not always converted to python in multiprocessing/threading"
format: html
execute:
  enabled: true
jupyter:
  kernelspec:
    display_name: "Python 3"
    language: python
    name: python3
---

## Minimal example

This notebook demonstrates a potential issue with rendering notebooks using `multiprocessing.Manager().Queue()` and `threading.Thread`.

```{python}
from multiprocessing import Manager
from threading import Thread
try:
    from tqdm import tqdm
except ImportError:
    tqdm = None

def update(progress_bar, queue, total):
    """Update progress bar based on values from the queue."""
    for _ in range(total):
        queue.get()
        if progress_bar is not None:
            progress_bar.update(1)

def simulate_issue():
    range_length = 10
    unit = "items"
    
    progress_bar = None if tqdm is None else tqdm(total=range_length, unit=unit)
    queue = Manager().Queue()
    thread = Thread(target=update, args=(progress_bar, queue, range_length))
    thread.start()
    
    for _ in range(range_length):
        queue.put('done')
    
    thread.join()
    if progress_bar is not None:
        progress_bar.close()
```

executing `simulate_issue()` proceeds without error in a jupyter notebook or ipython terminal

```{python}
simulate_issue()
```

but leads to

```pytb
  0%|          | 0/10 [00:00<?, ?items/s]Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "~/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 288, in run_path
    code, fname = _get_code_from_file(run_name, path_name)
  File "~/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 257, in _get_code_from_file
    code = compile(f.read(), fname, 'exec')
  File "~/template.qmd", line 1
    ---
       ^
SyntaxError: invalid syntax
```

when executed with quarto render.

Expected behavior

The notebook should render without error.

Actual behavior

The following error is surfaced from the cli

❯ quarto render template.qmd --debug

Starting python3 kernel...Done

Executing 'template.ipynb'
  Cell 1/2: ''...Done
  Cell 2/2: ''...ERROR: 

An error occurred while executing the following cell:
------------------
simulate_issue()
------------------

----- stderr -----
  0%|          | 0/10 [00:00<?, ?items/s]
------------------

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 simulate_issue()

Cell In[1], line 20, in simulate_issue()
     17 unit = "items"
     19 progress_bar = None if tqdm is None else tqdm(total=range_length, unit=unit)
---> 20 queue = Manager().Queue()
     21 thread = Thread(target=update, args=(progress_bar, queue, range_length))
     22 thread.start()

File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/context.py:57, in BaseContext.Manager(self)
     55 from .managers import SyncManager
     56 m = SyncManager(ctx=self.get_context())
---> 57 m.start()
     58 return m

File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/managers.py:562, in BaseManager.start(self, initializer, initargs)
    560 ident = ':'.join(str(i) for i in self._process._identity)
    561 self._process.name = type(self).__name__  + '-' + ident
--> 562 self._process.start()
    564 # get address of server
    565 writer.close()

File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/process.py:121, in BaseProcess.start(self)
    118 assert not _current_process._config.get('daemon'), \
    119        'daemonic processes are not allowed to have children'
    120 _cleanup()
--> 121 self._popen = self._Popen(self)
    122 self._sentinel = self._popen.sentinel
    123 # Avoid a refcycle if the target function holds an indirect
    124 # reference to the process object (see bpo-30775)

File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/context.py:288, in SpawnProcess._Popen(process_obj)
    285 @staticmethod
    286 def _Popen(process_obj):
    287     from .popen_spawn_posix import Popen
--> 288     return Popen(process_obj)

File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/popen_spawn_posix.py:32, in Popen.__init__(self, process_obj)
     30 def __init__(self, process_obj):
     31     self._fds = []
---> 32     super().__init__(process_obj)

File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/popen_fork.py:19, in Popen.__init__(self, process_obj)
     17 self.returncode = None
     18 self.finalizer = None
---> 19 self._launch(process_obj)

File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/popen_spawn_posix.py:42, in Popen._launch(self, process_obj)
     40 tracker_fd = resource_tracker.getfd()
     41 self._fds.append(tracker_fd)
---> 42 prep_data = spawn.get_preparation_data(process_obj._name)
     43 fp = io.BytesIO()
     44 set_spawning_popen(self)

File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py:183, in get_preparation_data(name)
    180 # Figure out whether to initialise main in the subprocess as a module
    181 # or through direct execution (or to leave it alone entirely)
    182 main_module = sys.modules['__main__']
--> 183 main_mod_name = getattr(main_module.__spec__, "name", None)
    184 if main_mod_name is not None:
    185     d['init_main_from_name'] = main_mod_name

AttributeError: module '__main__' has no attribute '__spec__'

while

a different error occurs from the VS Code extension

  0%|          | 0/10 [00:00<?, ?items/s]Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "~/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 288, in run_path
    code, fname = _get_code_from_file(run_name, path_name)
  File "~/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 257, in _get_code_from_file
    code = compile(f.read(), fname, 'exec')
  File "~/template.qmd", line 1
    ---
       ^
SyntaxError: invalid syntax

Your environment

environment

❯ code --version                         
1.87.2
863d2581ecda6849923a2118d93a088b0745d9d6
arm64

❯ uname -a                               
Darwin 22.6.0 Darwin Kernel Version 22.6.0: Fri Sep 15 13:41:28 PDT 2023; root:xnu-8796.141.3.700.8~1/RELEASE_ARM64_T6000 arm64

❯ sw_vers -productVersion          
13.6

Quarto check output

quarto check

❯ quarto check                                                                              
Quarto 1.4.551
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.11: OK
      Dart Sass version 1.69.5: OK
      Deno version 1.37.2: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.4.551
      Path: /Applications/quarto/bin

[✓] Checking tools....................OK
      TinyTeX: (not installed)
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: Installation From Path
      Path: /Library/TeX/texbin
      Version: 2023

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.10.13
      Path: /xxxx-py3.10/bin/python3
      Jupyter: 5.3.0
      Kernels: ir, julia-1.9, bash, maxima, python3

[✓] Checking Jupyter engine render....OK

@cameronraysmith cameronraysmith added the bug Something isn't working label Mar 19, 2024
@cscheid cscheid self-assigned this Mar 19, 2024
@cscheid cscheid added this to the v1.5 milestone Mar 19, 2024
@cscheid
Copy link
Collaborator

cscheid commented Mar 19, 2024

Thanks for the report. I get a different error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 simulate_issue()

Cell In[1], line 21, in simulate_issue()
     18 unit = "items"
     20 progress_bar = None if tqdm is None else tqdm(total=range_length, unit=unit)
---> 21 queue = Manager().Queue()
     22 thread = Thread(target=update, args=(progress_bar, queue, range_length))
     23 thread.start()

File /opt/homebrew/Cellar/[email protected]/3.10.13_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py:57, in BaseContext.Manager(self)
     55 from .managers import SyncManager
     56 m = SyncManager(ctx=self.get_context())
---> 57 m.start()
     58 return m

File /opt/homebrew/Cellar/[email protected]/3.10.13_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/managers.py:562, in BaseManager.start(self, initializer, initargs)
    560 ident = ':'.join(str(i) for i in self._process._identity)
    561 self._process.name = type(self).__name__  + '-' + ident
--> 562 self._process.start()
    564 # get address of server
    565 writer.close()

File /opt/homebrew/Cellar/[email protected]/3.10.13_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py:121, in BaseProcess.start(self)
    118 assert not _current_process._config.get('daemon'), \
    119        'daemonic processes are not allowed to have children'
    120 _cleanup()
--> 121 self._popen = self._Popen(self)
    122 self._sentinel = self._popen.sentinel
    123 # Avoid a refcycle if the target function holds an indirect
    124 # reference to the process object (see bpo-30775)

File /opt/homebrew/Cellar/[email protected]/3.10.13_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py:288, in SpawnProcess._Popen(process_obj)
    285 @staticmethod
    286 def _Popen(process_obj):
    287     from .popen_spawn_posix import Popen
--> 288     return Popen(process_obj)

File /opt/homebrew/Cellar/[email protected]/3.10.13_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py:32, in Popen.__init__(self, process_obj)
     30 def __init__(self, process_obj):
     31     self._fds = []
---> 32     super().__init__(process_obj)

File /opt/homebrew/Cellar/[email protected]/3.10.13_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_fork.py:19, in Popen.__init__(self, process_obj)
     17 self.returncode = None
     18 self.finalizer = None
---> 19 self._launch(process_obj)

File /opt/homebrew/Cellar/[email protected]/3.10.13_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py:42, in Popen._launch(self, process_obj)
     40 tracker_fd = resource_tracker.getfd()
     41 self._fds.append(tracker_fd)
---> 42 prep_data = spawn.get_preparation_data(process_obj._name)
     43 fp = io.BytesIO()
     44 set_spawning_popen(self)

File /opt/homebrew/Cellar/[email protected]/3.10.13_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py:183, in get_preparation_data(name)
    180 # Figure out whether to initialise main in the subprocess as a module
    181 # or through direct execution (or to leave it alone entirely)
    182 main_module = sys.modules['__main__']
--> 183 main_mod_name = getattr(main_module.__spec__, "name", None)
    184 if main_mod_name is not None:
    185     d['init_main_from_name'] = main_mod_name

AttributeError: module '__main__' has no attribute '__spec__'

It seems that spawn.py performs some kind of main module inspection in order to do its work, and this is failing for me. Looking at the error you're getting, it looks like the execution in your case is going further, and failing when attempting to parse "~/template.qmd", which I assume is the file you're using for this report.

I'm just confused here, because it seems that if runpy is parsing the input, then there's no way it would work for .ipynb files either. (I believe you that it works; it just means that something else is going on)

@cscheid
Copy link
Collaborator

cscheid commented Mar 19, 2024

The relevant code from multiprocessing/spawn.py looks like this for me on Python 3.10:

def _fixup_main_from_path(main_path):
    # If this process was forked, __main__ may already be populated
    current_main = sys.modules['__main__']

    # Unfortunately, the main ipython launch script historically had no
    # "if __name__ == '__main__'" guard, so we work around that
    # by treating it like a __main__.py file
    # See https://github.com/ipython/ipython/issues/4698
    main_name = os.path.splitext(os.path.basename(main_path))[0]
    if main_name == 'ipython':
        return

    # Otherwise, if __file__ already has the setting we expect,
    # there's nothing more to do
    if getattr(current_main, '__file__', None) == main_path:
        return

    # If the parent process has sent a path through rather than a module
    # name we assume it is an executable script that may contain
    # non-main code that needs to be executed
    old_main_modules.append(current_main)
    main_module = types.ModuleType("__mp_main__")
    main_content = runpy.run_path(main_path,
                                  run_name="__mp_main__")
    main_module.__dict__.update(main_content)
    sys.modules['__main__'] = sys.modules['__mp_main__'] = main_module

Note the specific code paths to handle IPython execution there. I think this means that multiprocessing.Spawn is unlikely to ever going to work in Quarto without explicit support of some kind.

@cscheid
Copy link
Collaborator

cscheid commented Mar 19, 2024

The following error is surfaced in an interactive session

Ok, I can now reproduce this error as well.

@cscheid
Copy link
Collaborator

cscheid commented Mar 19, 2024

I managed to find an (exceedingly ugly) workaround for the issue when rendering to a terminal.

The terminal case

For reasons I don't at all understand, our subprocess that calls nbclient to execute the notebook cells (in src/resources/jupyter/notebook.py) creates a main module that is missing a __spec__ entry. However, if we explicitly set the globals() of the execution context to contain a __spec__ entry with value None, then the execution succeeds. I'll make that fix.

🤷

The interactive case

I believe @cameronraysmith refers here to executing the cells inside VS code using the Quarto extension. Although this is then a bug to be reported at quarto-dev/quarto, I'll document here what I found.

The problem seems to be happening in the way that the VS code Jupyter extension communicates the global module to the subprocess. Quarto's VS Code extension uses the Jupyter extension for interactive cell execution. In this situation, somehow the Jupyter extension communicates to spawn.py that the main module is actually the .qmd file that contains the cells. I don't know why 1) this is necessary and 2) how the Jupyter extension manages to do the right thing when used directly by opening a .ipynb file in VS Code and executing the cells interactively.

@cameronraysmith
Copy link
Author

cameronraysmith commented Mar 20, 2024

@cscheid many thanks for investigating this.

Thanks for the report. I get a different error:

Yes that's the error in what was formerly the terminal traceback yields details section from the OP. I swapped the order to clarify the error that impacts the cli at the top of this issue and place the error that occurs in the VS code extension at the top of quarto-dev/quarto#401.

refers here to executing the cells inside VS code using the Quarto extension

Yes that's correct.

this is then a bug to be reported at quarto-dev/quarto, I'll document here what I found.

Happy to post a link to or copy of this issue on quarto-dev/quarto if you'd like me to do that.

@cscheid
Copy link
Collaborator

cscheid commented Mar 20, 2024

That'd save me some time, thank you.

@cscheid
Copy link
Collaborator

cscheid commented Mar 22, 2024

The quarto-cli part of the bug is fixed on main.

@cscheid cscheid closed this as completed Mar 22, 2024
@cameronraysmith
Copy link
Author

cameronraysmith commented Jul 28, 2024

@cscheid this issue has reappeared for me by quarto 1.5.55 and python 3.10.14.
Same minimal example from #9134 (comment) with quarto render from the cli that produces

AttributeError: module '__main__' has no attribute '__spec__'

Let me know if you can reproduce whether you'd like to reopen this issue or have me create a new one. Thank you!

@cscheid
Copy link
Collaborator

cscheid commented Aug 16, 2024

@cameronraysmith Sorry for the late response. I can repro, and more damningly, I see this: 2710c41

The proper fix actually does the right thing. This will go in 1.6. Sorry about the mess.

@cameronraysmith
Copy link
Author

Not a problem @cscheid. Many thanks for following up!
I'll follow #10536 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working jupyter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants