Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce dependencies for running migration #674

Closed
bersace opened this issue Mar 26, 2020 · 28 comments
Closed

Reduce dependencies for running migration #674

bersace opened this issue Mar 26, 2020 · 28 comments
Labels
Milestone

Comments

@bersace
Copy link

bersace commented Mar 26, 2020

Hi,

Alembic requires Mako and python-editor. I suspect that alembic requires theses libraries only for alembic revision command. This command is useful only for developer.

When using alembic to apply migration on production, we don't need to create new revisions. Thus, theses dependencies are useless on production.

Would it be possible to have a subset of alembic feature for production deployement including only upgradate, downgrade and other introspection command without init, revision and other development commands ?

Regards,

@bersace
Copy link
Author

bersace commented Mar 26, 2020

Related to #126

@zzzeek
Copy link
Member

zzzeek commented Mar 26, 2020

I was not at all happy about adding the "editor" dependency however I am not famliar with a Python installation system that can selectively install dependencies based on what commands someone might want to use. if someone wants to install "alembic" and have all the commands that are advertised work correctly, then these two dependencies are needed.

I don't see why it's an issue for a library to have a few very small dependencies? if you are performing some kind of special distribution, you can omit these dependencies yourself using somehting like "pip install --no-deps".

again I'm not familiar with how one installs an app and selectively sets up different dependencies and I dont have the resources to look into it. feel free to suggest how this is done.

@bersace
Copy link
Author

bersace commented Mar 26, 2020

@zzzeek distutils provides extras_require feature to declare a set of optionnal dependencies. For example Dramatiq project uses this to enable code watching and autoreloading for development environment only.

Once you have optionnal dependencies declared, your code can handle the presence of dependency as a feature flag. Either you hide the command or the command fails with a proper message missing dependency X. ImportError is enough to catch the absence of dependency.

This may look like this :

$ pip install alembic
...
Successfully installed SQLAlchemy-1.3.15 alembic-1.4.2 python-dateutil-2.8.1
$ alembic init .
Missing maintainer dependencies. Install alembic[maintainer] to enable init command.
$ pip install alembic[maintainer]
...
Successfully installed Mako-1.1.2 MarkupSafe-1.1.1 python-editor-1.0.4 six-1.14.0
$ alembic init .
...
  Please edit configuration/connection/logging settings in './alembic.ini' before proceeding.
$

RPM/Deb packaging could handle this with either a meta-package or suggest dependencies or plain installation of optionnal dependencies.

@zzzeek
Copy link
Member

zzzeek commented Mar 26, 2020

if someone does "pip install alembic" it must install all dependencies. I cannot expose this detail to the vast majority of users who don't care about this.

Again I'm really not sure the risk of changing how the installation works, given that we just had a debacle for merely installing pyproject.toml, is worth it. I don't understand why it's so critical that you not have these two very small dependencies follow along unless you are installing on a cellphone or something.

@bersace
Copy link
Author

bersace commented Mar 26, 2020

Another option is to extract alembic-core that one can use alone on production and everyone uses alembic as usual. alembic requires alembic-core, making the change transparent.

It's so critical that you not have these two very small dependencies

I won't call it critical too. Theses dependencies have their own sub-dependencies. In container world, tinier images are better from different perspectives : cost of update, security, etc. Also, this would avoid issue such as #126 .

@zzzeek
Copy link
Member

zzzeek commented Mar 26, 2020

#126 was a transient issue due to some markupsafe stuff. the "editor" dependency is something like a 20 line python script. when you run your container, there is way more space taken up by all kinds of linux binary stuff as well as the entire Python standard library ("batteries included", after all) that you also aren't using. unless you have some kind of Python install that doesnt include everything here https://docs.python.org/3/library/ like unittest.mock, tkinter.ttk, optparse, bdb, hundreds of things that you likely aren't using, that's all in your container too. Alembic is not the big culprit here.

@CaselIT
Copy link
Member

CaselIT commented Mar 26, 2020

distutils provides extras_require feature to declare a set of optionnal dependencies.

Does extras_require only allows adding extra requirements or also allows specifying a subset?
I guess that if subsets are supported, a configuration with only the dependencies for "production" could be created?

Work would still be needed to make sure alembic does not break if some requirement is missing, but it's surely better that breaking everyone install by changing the default install

@bersace
Copy link
Author

bersace commented Mar 26, 2020

$ du -cshx {alembic,Mako,mako,python_editor,editor,MarkupSafe,markupsafe}*
1,5M    alembic
40K     alembic-1.4.2.dist-info
36K     Mako-1.1.2.dist-info
612K    mako
32K     python_editor-1.0.4.dist-info
4,0K    editor.py
4,0K    editor.pyc
28K     MarkupSafe-1.1.1.dist-info
108K    markupsafe
2,3M    total

Thanks Mako does not requires beaker anymore. Third of the size of Alembic is useless on production. I find that costy. If every libraries does the same, it becomes painful. I'm thinking of web framework, message queue libraries, SDKs, etc.

I don't want to be harsh. I'm just suggesting that on production, having an alembic-core like library would be appreciated.

@zzzeek
Copy link
Member

zzzeek commented Mar 26, 2020

If you can illustrate how to do it such that normal instalations are unaffected, that's a start.

however you have not explained why Mako, Markupsafe, and editor are so critical space-wise, but for example these randomly chosen Python standard library modules are not:


[classic@photon3 python3.7]$ du -cshx {xml,unittest,optparse,imp,csv,getpass,curses,email}*
1008K	xml
288K	xmlrpc
632K	unittest
60K	optparse.py
464K	importlib
12K	imp.py
16K	csv.py
8.0K	getpass.py
84K	curses
1.2M	email
3.7M	total

I'm pretty sure I can find another 10-20M of things you arent using from the Python standard library, how are you approaching that much bigger waste of space?

@bersace
Copy link
Author

bersace commented Mar 26, 2020

@zzzeek yup, stdlib has some garbage too. I won't open an issue on alembic project for this ;-) Actually, there is a PEP for this : https://www.python.org/dev/peps/pep-0594/

I don't buy the argument that given X waste 10MB, it's fine to waste another 1MB.

If you can illustrate how to do it such that normal instalations are unaffected, that's a start.

Here it is. Extract alembic-core python library from alembic and make alembic depends on it. Project wanting to ship only alembic-core won't have some commands.

$ pip install alembic-core
...
Successfully installed SQLAlchemy-1.3.15 alembic-1.4.2 python-dateutil-2.8.1
$ alembic init .
init command not implemented.
You should install alembic package.
$ pip install alembic
...
Successfully installed Mako-1.1.2 MarkupSafe-1.1.1 python-editor-1.0.4 six-1.14.0
$ alembic init .
...
  Please edit configuration/connection/logging settings in './alembic.ini' before proceeding.
$

Installing alembic requires alembic-core. That's transparent. You'll find such architecture in e.g. botocore vs boto vs awscli.

@zzzeek
Copy link
Member

zzzeek commented Mar 26, 2020

I don't have the resources to split alembic and maintain two packages. I would prefer if there were setuptools options to allow the single package to forego dependencies only if a positive intent is stated.

@CaselIT
Copy link
Member

CaselIT commented Mar 26, 2020

I don't buy the argument that given X waste 10MB, it's fine to waste another 1MB.

While I agree you here in general, maybe your use case would be best served by having two containers, ones that does the upgrade and another that only has the packages required to run the application.
There may be other packages that are only required while upgrading/developing your application that could be removed this way.

@jvanasco
Copy link
Member

Although Alembic (and the parent project, SqlAlchemy) are some of the more widely used Python libraries... the group of people developing and maintaining it is rather small. It is hard for me to imagine how @zzzeek could prioritize working on this feature above anything else.

I certainly see the benefit of having a smaller library for users who only need to run the migrations, but the core users of alembic have been developers who read and write migrations. Aside from the overhead of splitting out the library and maintaining two packages - which would also necessitate a lot of work to have them correctly share the alembiccommand`, a drastic change like this would inconvenience many users. A change like this would absolutely break several of my build and CI systems.

I also don't like using extras_require for this, because the extra components are added - not subtracted. This too would break existing systems and usage patterns - users would need to specify alembic[writer] instead of alembic.

Right now, a change like this isn't going to significantly benefit many users, is likely to cause many problems for most users, and will require quite a large effort to make happen. I am 👎

Another possible solution (beyond what @CaselIT suggests) for people who really need this, is to vendor a copy of Alembic into your source/distribution. That would allow you to explicitly control the requirements.

@bersace
Copy link
Author

bersace commented Mar 26, 2020

your use case would be best served by having two containers

@CaselIT I'm using Alembic to ensure schema is up to date on startup.

@bersace
Copy link
Author

bersace commented Mar 26, 2020

vendor a copy of Alembic into your source/distribution.

Yep, I considered this option. Just wanted to talk about this before.

@CaselIT
Copy link
Member

CaselIT commented Mar 26, 2020

I've given a look at the alembic code:

  • python-editor: there should not be problems to run without this one. It is imported once inside a function.
  • python-dateutil: used only in alembic\script\base.py when running generate revision. Should be easy to refactor alembic to import it in the function
  • make: is used in compat and utils modules, so I think it would require more work to make this optional.

So python-dateutil should be easy to make optional, Mako not so much, python-editor is already optional.

Edit: By optional I mean that when installing alembic with --no-deps these two packages would not be required for alembic to work in a limited faction. A normal install would still install them

@CaselIT
Copy link
Member

CaselIT commented Mar 26, 2020

I'm using Alembic to ensure schema is up to date on startup.

BTW when using this pattern, make sure that only one container starts at once. The db may not like it if more than one client tries to update the schema at the same time

@CaselIT
Copy link
Member

CaselIT commented Mar 26, 2020

This is a diff to make python-dateutil optional

diff --git a/alembic/script/base.py b/alembic/script/base.py
index fea9e87..84fa637 100644
--- a/alembic/script/base.py
+++ b/alembic/script/base.py
@@ -4,8 +4,6 @@ import os
 import re
 import shutil
 
-from dateutil import tz
-
 from . import revision
 from . import write_hooks
 from .. import util
@@ -514,6 +512,8 @@ class ScriptDirectory(object):
 
     def _generate_create_date(self):
         if self.timezone is not None:
+            from dateutil import tz
+
             # First, assume correct capitalization
             tzinfo = tz.gettz(self.timezone)
             if tzinfo is None:

@CaselIT
Copy link
Member

CaselIT commented Mar 27, 2020

@zzzeek does the above patch makes sense for alembic?

I'll provided a proper change set in that case

@zzzeek
Copy link
Member

zzzeek commented Mar 27, 2020

We should check all three of python-dateutil, mako, and editor and either make all three of them local imports, or run them with a catch of ImportError.

@CaselIT
Copy link
Member

CaselIT commented Mar 27, 2020

Mako is the hardest to make optional as indicated in my previous comment, and I'm not sure if the effort is warranted.

Using the change above, alembic would require only sqlalchemy and mako to execute.
I still don't think we should change the setup requirements, simply better support installs with --no-deps

@jvanasco
Copy link
Member

@CaselIT

I would prefer to see this change as something like:

-from dateutil import tz
+try:
+    from dateutil import tz
+except ImportError:
+    # support for minimized installation with `--no-deps`
+    pass

Nested imports like you suggested have caused me too many problems in code audits before.

@zzzeek
Copy link
Member

zzzeek commented Mar 27, 2020

I like that better however we need to worry about the degradation path if one of those commands tries to run, it will have a misleading error message about "tz not defined" rather than ImportError.

@zzzeek
Copy link
Member

zzzeek commented Mar 27, 2020

see this is where I strat writing those library functions like "optional_imports("datetime.tz") or something else that everyone will be confused by five years later

@CaselIT
Copy link
Member

CaselIT commented Mar 27, 2020

for tz, since it's used in a single place, we can add a if tz is None: raise ImportError(...)

@zzzeek
Copy link
Member

zzzeek commented Mar 29, 2020

right but now we have this pattern where it is:

try:
    import thing
except ImportError
   thing = None


# everywhere thing is used

def use_thing():
   if thing is none:
      raise ImportError
   thing.use()


that's too many moving parts to keep straight. this kind of thing, an "optional depedency", needs to be declared in one place and one place only. which means either a. build one of my signature optional_dependency("thing") systems, which are quite simple, no big deal, however IDEs dont like them, or b. we just import the libraries inside the defs that use them.

@CaselIT
Copy link
Member

CaselIT commented Mar 29, 2020

Indeed. My suggestion was limited to dateutils since it's used in only one place, so it would not be a big burden to move the import in the function or do the try/catch

@CaselIT CaselIT added this to the 1.7 milestone Jun 10, 2021
@sqla-tester
Copy link
Collaborator

CaselIT has proposed a fix for this issue in the master branch:

Revendor editor and make dateutil optional https://gerrit.sqlalchemy.org/c/sqlalchemy/alembic/+/2872

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants