Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Streamlit to 1.32.2 #783

Merged
merged 15 commits into from
Mar 15, 2024
Merged

Update Streamlit to 1.32.2 #783

merged 15 commits into from
Mar 15, 2024

Conversation

whitphx
Copy link
Owner

@whitphx whitphx commented Mar 12, 2024

TODO:

@whitphx
Copy link
Owner Author

whitphx commented Mar 12, 2024

Compare these:

Conditions:

  • M1 Mac, Sonoma 14.0
  • Arc Version 1.33.0 (47142), Chromium Engine Version 122.0.6261.112

Result:

  • https://docs.google.com/spreadsheets/d/1z_y0G_KldMg6JrrJ82cVU8e5NRTIo51S9ExEQa5IdkM/edit?usp=sharing
  • import streamlit.runtime and import streamlit.web (which may be unnecessary though) became very fast: ~3.5s -> ~0.35s.
  • Setting up the stlite-server (here) became slow somehow: ~2.7s -> ~5.2s
    • This might be because stlite still loads altair at initializaiton so Streamlit's lazy-loading doesn't make effects.
      def _fix_altair():
      """Fix an issue with Altair and the mocked pyarrow module of stlite."""
      try:
      from altair.utils import _importers # type: ignore[import]
      def _pyarrow_available():
      return False
      _importers.pyarrow_available = _pyarrow_available
      def _import_pyarrow_interchange():
      raise ImportError("Pyarrow is not available in stlite.")
      _importers.import_pyarrow_interchange = _import_pyarrow_interchange
      except ImportError:
      pass
      except Exception as e:
      logger.error("Failed to fix Altair", exc_info=e)
      • So these patches should be modified to be executed lazily. Maybe using something like importlib to be hooked at the initial import of the target modules?
    • For matplotlib, follow Use env variable to configure matplotlib backend streamlit/streamlit#8113

@lukasmasuch
Copy link
Contributor

This might be because stlite still loads altair at initializaiton so Streamlit's lazy-loading doesn't make effects.

It might be fine to leave it in bootstrap if the Remove the lazy-loaded packages such as altair from streamlit's requirements aspect gets implemented. If the user adds altair to requirements, the user likely wants to use altair in the app. In that case, having the additional loading time is probably fine.
Do you know how long it takes if altair is not installed?

@whitphx
Copy link
Owner Author

whitphx commented Mar 13, 2024

Thanks, it makes sense.
However I fixed _fix_altair before seeing your comment and it lazily patches altair when it's imported. I think this approach has some benefits such as

  • The loading time before the first view should be shortened as the loading time of altair is included in the first usage time of st.altair_chart.
  • This patch works even when altair is installed and imported after the initial load, which is kind of an edge-case.

I will measure the time of installing the packages soon anyway

@whitphx
Copy link
Owner Author

whitphx commented Mar 13, 2024

Removing the matplotlib patch resulted in reducing the booting up time from ~5.2s to ~4.9s.
https://docs.google.com/spreadsheets/d/1z_y0G_KldMg6JrrJ82cVU8e5NRTIo51S9ExEQa5IdkM/edit?hl=ja&pli=1#gid=0

@whitphx
Copy link
Owner Author

whitphx commented Mar 13, 2024

3f6b1ba introduced the lazy altair patching.
It reduced the booting-up time from ~4.9s to ~0.15s!

Overall, the init time became ~10s to ~4s.
Now the dominant parts are loading Pyodide and installing the wheels.

@whitphx
Copy link
Owner Author

whitphx commented Mar 13, 2024

After e2f668e which removed the click dependency,
there are 4 Streamlit's dependencies that can be omitted from the deps list.

  • altair: Used by st.altair_chart and all the built-in charts, ~6% usage.
  • pillow: Used by st.image and st.pyplot, and lazily loaded. ~30% usage.
  • tenacity: Used by st.connection and lazily loaded. ~1% usage.
  • toml: Used by config and secret toml parser.

(The usage data was provided by @lukasmasuch )

We should consider to remove them or not.
The benefits are reducing the initial loading payload size and loading time.
The downside is breaking the interoperability to Streamlit and inconvenience for the users who want to use these libraries where they have to declare such requirements explicitly.

Note:

@whitphx
Copy link
Owner Author

whitphx commented Mar 13, 2024

Measure the performance improvements when removing these packages

  1. Set the performance API in worker.ts around the package installation.
--- a/packages/kernel/src/worker.ts
+++ b/packages/kernel/src/worker.ts
@@ -172,6 +172,7 @@ async function loadPyodideAndPackages() {
   postProgressMessage("Installing packages.");
   await pyodide.loadPackage("micropip");
   const micropip = pyodide.pyimport("micropip");
+  performance.mark('start:install');
   if (wheels) {
     console.debug(
       "Installing the wheels:",
@@ -194,6 +195,8 @@ async function loadPyodideAndPackages() {
     await micropip.install.callKwargs(requirements, { keep_going: true });
     console.debug("Installed the requirements");
   }
+  performance.mark('end:install');
+  console.log("PERF:", performance.measure('install', 'start:install', 'end:install'));

   // The following code is necessary to avoid errors like  `NameError: name '_imp' is not defined`
   // at importing installed packages.
  1. Build the wheel file for each setting (modifying streamlit/lib/setup.py)
  2. Start the sharing dev server (cd packages/sharing & yarn start)
  3. Open the page and log the performance in each case.

Baseline (e2f668e)

pyodide.asm.js:9 Loading typing-extensions, cachetools, protobuf, numpy, pillow, fastparquet, cramjam, pandas, python-dateutil, six, pytz, fsspec, pyodide-http, jinja2, markupsafe, toolz, jsonschema, attrs, pyrsistent
92.0MB

  • 1341.2000000001863
  • 1339.2999999998137
  • 1343.8999999994412
  • 1431.1000000005588
  • 1387.2000000001863
  • 1366.7000000001863

Remove all:

86.5MB
pyodide.asm.js:9 Loading protobuf, pandas, numpy, python-dateutil, six, pytz, typing-extensions, fastparquet, cramjam, fsspec, cachetools, pyodide-http

  • 1029.6000000005588
  • 1004.7000000001863
  • 1030.2999999998137
  • 1053.3999999994412
  • 1064.2999999998137
  • 1197.5999999996275

Remove only altiar:

pyodide.asm.js:9 Loading pillow, cachetools, numpy, protobuf, typing-extensions, fastparquet, cramjam, pandas, python-dateutil, six, pytz, fsspec, pyodide-http
89.6MB

  • 1134.7999999998137
  • 1143
  • 1205.4000000003725
  • 1147.0999999996275
  • 1148.7999999998137
  • 1159.8999999994412

Remove only pillow:

pyodide.asm.js:9 Loading pandas, numpy, python-dateutil, six, pytz, typing-extensions, pyodide-http, cachetools, fastparquet, cramjam, fsspec, protobuf, jsonschema, attrs, pyrsistent, jinja2, markupsafe, toolz
89.0MB

  • 1252.7999999998137
  • 1318
  • 1312.1000000005588
  • 1334
  • 1329
  • 1307.1000000005588

Remove only tenacity:

pyodide.asm.js:9 Loading protobuf, pandas, numpy, python-dateutil, six, pytz, pillow, typing-extensions, fastparquet, cramjam, fsspec, pyodide-http, cachetools, jsonschema, attrs, pyrsistent, jinja2, markupsafe, toolz
91.9MB

  • 1324.5
  • 1319.5
  • 1312.2999999998137
  • 1369.5
  • 1445.9000000003725
  • 1389.5

Remove only toml:

pyodide.asm.js:9 packaging already loaded from default channel
18:45:43.076 pyodide.asm.js:9 Loading typing-extensions, fastparquet, cramjam, numpy, pandas, python-dateutil, six, pytz, fsspec, protobuf, pillow, pyodide-http, cachetools, jsonschema, attrs, pyrsistent, toolz, jinja2, markupsafe
92.0MB

  • 1343.3999999994412
  • 1390.3999999994412
  • 1390.3999999994412
  • 1410.6000000005588
  • 1482.7000000001863
  • 1359.699999999255

@whitphx
Copy link
Owner Author

whitphx commented Mar 13, 2024

  • Removing all the packages makes ~0.3s improvement.
  • Removing toml has almost no effect.
    • However we should remove it for less dependencies. Stlite doesn't use toml. -> Done: 376012d
  • Removing altair, pillow, and tenacity is a hard decision... Is it worth doing in these balances between performance improvement and users' inconvenience?
    • I think we don't have to do this in this PR right away.

@lukasmasuch
Copy link
Contributor

Awesome analysis :)

Do you know where python-dateutil is coming from? It's one of the dependencies that Streamlit removed in the last version.

altair, pillow, and tenacity might be good candidates to remove once we find a good way for a lazy install feature.

Copy link
Contributor

@lukasmasuch lukasmasuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 But 1.32.1 still has an issue causing significant CPU load. We will likely roll out the 1.32.2 patch fixing this in a couple of hours.

@whitphx
Copy link
Owner Author

whitphx commented Mar 14, 2024

@lukasmasuch Thanks

Do you know where python-dateutil is coming from?

Looks like it's a pandas' dependency.
Even with normal Streamlit, python-dateutil is installed along with pandas.

altair, pillow, and tenacity might be good candidates to remove once we find a good way for a lazy install feature.

Indeed!

We will likely roll out the 1.32.2 patch fixing this in a couple of hours.

Thanks! Will wait for and merge it.

@whitphx
Copy link
Owner Author

whitphx commented Mar 14, 2024

I came up with this idea of lazy-install -> whitphx/streamlit#9

@whitphx whitphx changed the title Update Streamlit to 1.32.1 Update Streamlit to 1.32.2 Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants