Log `git checkout` and expose to users #3520

humitos · 2018-01-15T20:57:33Z

I've been taking a look at the code and I found some consideration that we need to take a look:

BaseVCS class has it owns run method that works in a similar way than BuildEnvironment.run
- the log_tmpl string differs
- it adds some configurations automatically, like the cwd path
All commands ran by BaseVCS are not logged as a BuildCommand
BuildCommand needs to be instantiated with build_env to make be able to .save()/record the command for the current Build object.
build_env could be LocalEnvironment or DockerEnvironment

Depending on how much we want to refactor, a minimalistic solution would involve:

pass setup_env to update_imported_docs
pass this env to version_repo.checkout
under checkout consider use if that env if not None
env.run with proper cwd

This is the what this PR implements but, this leaves a mix of calls to run method between BaseVCS and LocalEnvironment for the same purpose: update the repository's code. So, maybe we should consider to just pass the setup_env to the init of BaseVCS and execute everything under setup_env.

The problem of using always the setup_env (or maybe a new one called vcs_env) is that the record affects all the commands ran under that env and we want to record only clone and checkout commands --the run method could receive an optional record to override the default behaviour.

It's a good starting point to discuss a little about this.

This PR only affects to the git backend, that's why I think we need to think in a general solution under the BaseVCS class.

Closes #3397

humitos · 2018-01-15T21:08:21Z

Also, we may want to remove the --quiet option from the git command?

...

but if it fails...

humitos · 2018-01-15T21:11:34Z

readthedocs/vcs_support/backends/git.py

@@ -173,7 +189,7 @@ def checkout(self, identifier=None):
            self.fetch()


We probably needs to pass the env here, also 😞

ericholscher · 2018-01-16T00:38:48Z

I think we should be able to set a per-command option to choose whether to record that call ever. It seems like we haven't needed that in the past, but for the VCS stuff we'll want to have a subset of things that aren't recorded.

One question, is there a reason we really don't want all the VCS commands recorded? It might be extra info, but it doesn't seem inherently bad to show them.

ericholscher

I think we should remove the BaseCLI:run() command from the VCS tools, and move to always just proxying those commands to the Environment:run(). We really shouldn't have two different ways of running commands in our build process, and the environments are a better abstraction.

ericholscher · 2018-01-16T00:39:51Z

readthedocs/projects/tasks.py

@@ -593,7 +593,7 @@ def update_imported_docs(version_pk):
                version_slug = version.slug
                version_repo = project.vcs_repo(version_slug)

-                ret_dict['checkout'] = version_repo.checkout(version.identifier)
+                ret_dict['checkout'] = version_repo.checkout(version.identifier, env)


Do we need the setup_env here, or could we just create a new vcs_env inside checkout?

Seems we could pass it into the original call to project.vcs_repo(version_slug) in the line above, and stored on the repo object?

I'm not suggesting this btw, just thinking out loud. Is it worth having a vcs_env we use for all commands, or is passing in the setup_env correct? I'm not sure!

My initial thought was to just use the setup_env. The difference between setup_env and build_env is just that one is executed on the host, and one in the vm. Separating this further might make for more places to check for env exception vs env failure vs env pass as well, so reusing setup_env ends up cleaner too.

Do you have a scenario that you think it might help to have a separate env?

I think passing env into the VCS backend instantiation is the correct pattern. I envisioned completely removing the run() method on the vcs backends, as we shouldn't be duplicating code there. Perhaps there are some commands that are not important to the user to expose in the UI and we can not record those, but executing all commands through the setup_env is important for this feature.

The setup_env is not used for VCS checkouts since they are using their own run. In fact, the function/task didn't receive the setup_env, I had to added to it.

That could be a problem because that task (update_imported_docs) is called from a couple of places where a setup_env doesn't exist (e.g. from core/views/hooks.py). This could be one of the reason of why a vcs_env could be a better approach: we could create it inside the Project.vcs_repo instead of passing around through all the code or forcing other calls to create this environment just to log the commands.

Something that we need to consider if we want to create a new vcs_env is that the envs are created with this dependencies:

self.project

self.version

self.config (from YAML)

self.build

record

env_vars. From get_env_vars, which depends on

self.version

self.project

self.config

In case we want to create the environment inside the Project.vcs_repo will need to consider this dependencies.

I took a look at this, and I found that the only missing piece here is the build_pk (originally passed from core.utils.trigger_build after creating an Build object).

Solution 1

So, I'd say that this method update_imported_docs could receive the version_pk and build_pk (but when called from UpdateDocsTask we will be repeating a couple of actions like parsing the YAML and more) and create our own vcs_env for this. If build_pk is None we won't record (False) the commads since we don't know where to record them. This will allow other uses of this function to keep working.

Solution 2

Refactor all the things!

move update_imported_docs as a method of UpdateDocsTask

fix test_celery.py:test_update_imported_docs

refactor/remove core.view.hooks._build_url function that uses the function update_imported_docs

refactor/remove github_build, gitlab_build and bitbucket_build functions that use the _build_url but the docstrings say those functions are deprecated in favor of readthedocs.restapi.views.integrations classes (these classes use core.views.hooks._build_version which uses trigger_build which uses UpdateDocsTask --so we are safe)

I think the last solution is a better plan, but I may missing a lot of things in the middle :)

We found that there are users using the old webhook endpoints. Only projects with Feature.ALLOW_DEPRECATED_WEBHOOK can hit these ones. But anyway, we can't remove them yet.

So, the refactor would include changing calls to projects.tasks.update_imported_docs,

https://github.com/rtfd/readthedocs.org/blob/445683fc0c86b0de56779447a8eedf98024dba32/readthedocs/core/views/hooks.py#L125-L129

by core.utils.trigger_build which will end up calling UpdateDocsTask in a proper way and "everything should keep working"

I think we need think more about why we are calling into this function from other places, and build a supported API for that use case.

I believe all the other calls are so that we can re-sync the checkout on disk and the DB (to update versions on a push, for example). We should find a way to make that supported explicitly in the build tooling I think, so that we don't have to maintain a bunch of different logic. I think the best approach here is to have some kind of mechanism on the doc builds to say "Do a resync of all the on-disk stuff, but don't actually build the docs", which is mostly what your Solution 2 is talking about.

ericholscher · 2018-01-16T00:39:53Z

readthedocs/projects/tasks.py

@@ -335,7 +335,7 @@ def setup_vcs(self):
        self.setup_env.update_build(state=BUILD_STATE_CLONING)

        self._log(msg='Updating docs from VCS')
-        update_imported_docs(self.version.pk)
+        update_imported_docs(self.version.pk, self.setup_env)


Would like to see kwargs here. I'd love to get to a world where we're explicitly using kwargs in all function/method calls, at least until we can add type checking to the calls :)

We can also just drop this arg. Other methods use the class local self.setup_env, so it might make sense to continue assuming this state exists.

I don't particularly like this pattern, but its at least consistent.

update_imported_docs is a function outside this UpdateDocsTask class (it's used from other places). See #3520 (comment)

agjohnson

The output is a huge improvement, I'm really excited to have this output included and unified with the rest of the command output finally!

I'll echo the sentiment to continue cleaning up the vcs backend code and completely remove the secondary run() method. I'll also echo the uncertainty of displaying all the commands, could you post some screenshots of the commands all executed through the build environment?

agjohnson · 2018-01-16T04:51:26Z

readthedocs/projects/tasks.py

@@ -335,7 +335,7 @@ def setup_vcs(self):
        self.setup_env.update_build(state=BUILD_STATE_CLONING)

        self._log(msg='Updating docs from VCS')
-        update_imported_docs(self.version.pk)
+        update_imported_docs(self.version.pk, self.setup_env)


We can also just drop this arg. Other methods use the class local self.setup_env, so it might make sense to continue assuming this state exists.

I don't particularly like this pattern, but its at least consistent.

agjohnson · 2018-01-16T04:55:42Z

readthedocs/projects/tasks.py

@@ -593,7 +593,7 @@ def update_imported_docs(version_pk):
                version_slug = version.slug
                version_repo = project.vcs_repo(version_slug)

-                ret_dict['checkout'] = version_repo.checkout(version.identifier)
+                ret_dict['checkout'] = version_repo.checkout(version.identifier, env)


I think passing env into the VCS backend instantiation is the correct pattern. I envisioned completely removing the run() method on the vcs backends, as we shouldn't be duplicating code there. Perhaps there are some commands that are not important to the user to expose in the UI and we can not record those, but executing all commands through the setup_env is important for this feature.

agjohnson · 2018-01-16T05:04:39Z

readthedocs/vcs_support/backends/git.py

+            code, out, err = build_cmd.exit_code, build_cmd.output, build_cmd.error
+        else:
+            code, out, err = self.run('git', 'checkout',
+                                      '--force', '--quiet', revision)


Based on your screenshots, --quiet is not what we want it seems. The output without --quiet was really helpful. So 👍 on removing that from the command.

agjohnson · 2018-01-16T05:32:08Z

One more thought:

build_env could be LocalEnvironment or DockerEnvironment

We probably won't want to support this. In the case of our commercial hosting, this step is specifically kept out of the Docker build environment so that the build environment has no read access to the source deploy keys.

ericholscher · 2018-01-16T18:38:22Z

@agjohnson I'd really like to run the VCS calls inside Docker on the .org though, so I think having it be configurable by a setting or similar is good.

humitos · 2018-01-17T21:04:42Z

I'll also echo the uncertainty of displaying all the commands, could you post some screenshots of the commands all executed through the build environment?

I'm working on running all the commands inside a BuildEnvironment. These are the commands executed in an already imported project (for example when a webhook comes) so far:

In [4]: vcs = Project.objects.get(slug='poliastro').vcs_repo()

In [5]: vcs.checkout()
[17/Jan/2018 15:01:55] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git status' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:55] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git remote set-url origin https://github.com/poliastro/poliastro' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:55] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git fetch --tags --prune' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git show-ref remotes/origin/master' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git checkout --force --quiet origin/master' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git clean -d -f -f' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git submodule sync' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git submodule update --init --recursive --force' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
Out[5]: (0, '', None)

humitos · 2018-01-18T17:50:40Z

I just updated this PR with several changes implementing the idea that I commented you both. This is not a final solution but some code to discuss over something concrete.

Summarizing, the idea is to have a BaseEnvironment which doesn't depend on a Build where we can run arbitrary commands: this is the case of some VCS command that we need to run outside a build.

This code will probably needs some polishing and arrangement between us, but it shows the idea I think.

@agjohnson screenshot for all the VCS commands recorded

humitos · 2018-01-18T17:55:45Z

Initial import...

Note that the first command failed, but it's under warn_only so I think we shouldn't record those ones to avoid this problems.

humitos · 2018-01-18T18:00:03Z

Just imported repo without logging the warn_only commands:

humitos

I left some comments where I have doubts or where I wanted to explain a little more about that change.

Anyway, this code still need work for sure but I will wait for your comments before continue working on this.

cc @ericholscher @agjohnson

humitos · 2018-01-19T00:31:00Z

readthedocs/doc_builder/environments.py

+    def _log_warn_only(self, msg):
+        log.warn(LOG_TEMPLATE.format(
+            project=self.project.slug,
+            version='latest',


I don't like this, but I don't want to depend on a version instance to just get the slug. I would like to think in a better way to implement it, though

humitos · 2018-01-19T00:31:54Z

readthedocs/doc_builder/environments.py

+        # :'(
+        log.warn(LOG_TEMPLATE.format(
+            project=self.project.slug,
+            version=self.version.slug,


This one can use the version object because the *BuildEnvironment are instantiated with this object. No real problem here.

humitos · 2018-01-19T00:32:33Z

readthedocs/doc_builder/python_environments.py

@@ -163,7 +163,7 @@ def save_environment_json(self):
        with open(self.environment_json_path(), 'w') as fpath:
            # Compatibility for Py2 and Py3. ``io.TextIOWrapper`` expects
            # unicode but ``json.dumps`` returns str in Py2.
-            fpath.write(unicode(json.dumps(data)))
+            fpath.write(json.dumps(data))


Someone proposed to use six.text_type, but I didn't try it yet

humitos · 2018-01-19T00:33:32Z

readthedocs/projects/tasks.py

    """

    max_retries = 5
    default_retry_delay = (7 * 60)
    name = __name__ + '.update_docs'

+    # TODO: the argument from the __init__ are used only in tests


I noted this because I found these non-required attributes kind of confusing

Perhaps some docs should exist on these? Optionally, we could use a separate pattern for this, such as a separate class method used just for testing. I have no strong opinions here though, just throwing out some options.

humitos · 2018-01-19T00:35:21Z

readthedocs/projects/tasks.py

+                version_repo = self.project.vcs_repo(
+                    self.version.slug,
+                    # When ``sync_only`` we don't a setup_env
+                    getattr(self, 'setup_env', None),


This is also kind of hacky: sync_repo can be executed before (sync_only=True) or after the run_setup (the one that creates the self.setup_env we want to use here in case it exists)

humitos · 2018-01-19T00:36:49Z

readthedocs/vcs_support/backends/bzr.py

@@ -21,7 +22,7 @@ class Backend(BaseVCS):

    def update(self):
        super(Backend, self).update()
-        retcode = self.run('bzr', 'status')[0]
+        retcode = self.run('bzr', 'status', warn_only=True)[0]


This command is used to check the existence of the repository and call pull or clone for example.

humitos · 2018-01-19T00:37:21Z

readthedocs/vcs_support/backends/git.py

@@ -66,21 +66,21 @@ def checkout_revision(self, revision=None):
            revision = 'origin/%s' % branch

        code, out, err = self.run('git', 'checkout',
-                                  '--force', '--quiet', revision)
+                           '--force', revision)


Since we are exposing the output to the user, I removed all the quiet arguments.

humitos · 2018-01-19T00:37:43Z

readthedocs/vcs_support/backends/hg.py

@@ -23,7 +24,7 @@ def pull(self):
        (pull_retcode, _, _) = self.run('hg', 'pull')
        if pull_retcode != 0:
            raise RepositoryError
-        (update_retcode, stdout, stderr) = self.run('hg', 'update', '-C')
+        (update_retcode, stdout, stderr) = self.run('hg', 'update', '--clean')


I expand the arguments to use the long form which is clearer.

humitos · 2018-01-19T00:38:40Z

readthedocs/vcs_support/base.py


    """
    Base for VCS Classes.

-    Built on top of the BaseCLI.
+    VCS commands are ran inside a ``LocalEnvironment``.


In fact, could be any Environment not just Local. By default, is LocalEnvironment

humitos · 2018-01-19T00:39:19Z

readthedocs/vcs_support/base.py

+        environment = os.environ.copy()
+
+        # TODO: kind of a hack
+        del environment['PATH']


I need to delete this one because there is an assert in the code that doesn't allow the PATH in the environment

Probably best anyways? I think we want tight control over this.

agjohnson · 2018-01-22T16:47:47Z

Now that I've seen how many commands are raised to the user, I feel like there is perhaps some filtering of commands that we need to do. I think it's important to reduce redundancy in the implementations and use a central execution pattern, but it probably isn't useful for users to see all of these commands.

For instance, if we do a git status to check for a repository, this shouldn't be surfaced to the user, and it shouldn't show as a failed command, as it isn't important from the user's perspective. Commands like git checkout and git subodule ... are important though.

This will probably also get around the issue that a command like git status will normally fail and execution will continue regardless. We don't need to show this as a failure to the user.

humitos · 2018-01-22T20:38:43Z

To add another place where we are running commands outside a Build and even outside an Environment, I just found this:

https://github.com/rtfd/readthedocs.org/blob/1af444173481df9a7b16e015fc6b12abe8155e7e/readthedocs/projects/utils.py#L48-L107

(used for symlinks, for example)

agjohnson · 2018-01-23T04:52:15Z

Yeah, that's the other one that needs to be unified. Let's address that separately though. I'll open a ticket for that.

agjohnson

This looks great! I could use some more explanation around moving the update docs task into the main build task, as it seems like a conditional use of the main task. Maybe all that is required here is an abstraction though.

We already discussed the addition of a record on run(), but i agree the verbose cli options is a necessary addition for all of the vcs commands. It makes the user view more obvious and makes it easier for development as well.

agjohnson · 2018-01-23T09:02:21Z

readthedocs/doc_builder/environments.py

+
+class LocalEnvironment(BaseEnvironment):
+
+    # TODO: BuildCommand name doesn't make sense here, should be just Command


True, this meaning has changed now. 👍 on changing

However, maybe this is a later refactor after all. If we generalize the naming, we should probably also move out of doc_builder/ app.

Opened a ticket for this: #3541

agjohnson · 2018-01-23T10:13:24Z

readthedocs/doc_builder/environments.py

+        kwargs['environment'] = self.environment
+
+        # TODO: ``build_env`` is passed as ``kwargs`` when it's called from a ``*BuildEnvironment``
+        # kwargs['build_env'] = self


I don't quite follow this comment, but I assume this note makes more sense for you :)

I wanted to leave that comment to discuss a little about this.

build_env was self, but now if it comes, it's from the overriden run method from the class that is calling this method.

Sounds a little elaborated, that's why I'd appreciate your thoughts on this.

agjohnson · 2018-01-23T10:15:34Z

readthedocs/doc_builder/environments.py

+        self.commands.append(build_cmd)
+        build_cmd.run()
+
+        # TODO: I don't like how it's handled this entry point here


What is your concern here? I can see how warn_only would be confusing as we're now considering several different return states:

Command passed, report

Command failed, report

Command failed, report and fail the build

Command passed, don't report

Command failed, don't report

Command failed, don't report but fail the build (this probably shouldn't be used ever, but maybe there are steps that should raise a general error?)

Command failed, but report as pass? (I don't know if this still exists though)

This TODO comment was moved accidentally. It belongs to this line: https://github.com/rtfd/readthedocs.org/pull/3520/files/7f6c098983acb4bb3afc90e5c780af7dae9ec9d8#r163242699

I agree with the return states, but I'd say that the last two shouldn't exist at all. The rest, are needed.

Command failed, but report as pass? (I don't know if this still exists though)

This is tricky, but it does exist. Although, we are not using it but maybe we need it.

For example, hg tags command. I suppose that we want to record=True it but we want to warn_only=True also, since it could fail but we don't want the Build to fail. Besides, if it's recorded and failed (exit status != 0) it will be presented in red to the user and we don't want that.

Is that a good example of this use case?

Sumarizing, I think we want the warn_only commands to be recorded but don't interfers in the build status nor in the command itself (when presented to the user).

Maybe what we need to change is the BuildCommandResultMixin.successful property to something like:

return self.exit_code == 0 and self.warn_only == False

but that would involve to save the warn_only in the database also and I don't know if this is getting more and more complicated :/

Another possible idea is that if warn_only we record it, but we set the exit_code as 0 (this could end up with other problems like the Latex build fail and we will never know it, even when opening the Build interface since it won't be marked as failed -in red).

So, add another attribute? warn_only and force_success 😛 ?

👍 Its sounding like we also want a force_success option as well here.

Meh, I just realized that we need something like record_as_success instead of force_success because we are using git status exit code to decide if clone or pull, so we need the real exit code in the flow but we want to save a fake one in the databse...

agjohnson · 2018-01-23T10:16:39Z

readthedocs/doc_builder/environments.py

+        if not warn_only:
+            # TODO: maybe receive ``record`` as an attribute for skip/record
+            # just specific commands but not all of them ran under the
+            # *BuildEnvironment


This note sounds like we want a record kwarg on command execution? I don't quite understand the comment on *BuildEnvironment though.

Hehe.

Yes, I was thinking on having the record attribute passed in the run method to override the one from the __init__ from *BuildEnvironment classes (now we have LocalEnvironment and (Local/Docker)BuildEnvironment)

Yeah, i think a record arg on run() makes sense. Given __init__(record=False, ...), and run(record=True, ...), what do think we should do? I'd probably say don't record if __init__(record=False), thoughts?

I think it's better to follow what run(record=) is set independently how was the class initialized.

Why? Because sometimes the LocalEnvironment is initialized far away (in another file and just passed as argument) from the code where the command is ran and you want/need to override the default record.

I implemented this at https://github.com/rtfd/readthedocs.org/blob/e1766a4d1061f7e97c75039b84a196088b111164/readthedocs/doc_builder/environments.py#L310-L312

agjohnson · 2018-01-23T10:17:26Z

readthedocs/doc_builder/environments.py

+            # TODO: do we want to save commands that FAILED but not raised and
+            # exception?  This will cause the first `git status` (when
+            # importing) to fail and be marked with RED in the Build command
+            # details


Raised in general review of this, I'd say having a record option that wouldn't record this command at all is a better pattern.

We need both: record=False and warn_only=True In fact, I'm thinking that record=False should implicit mean warn_only=True because the False case doesn't make sense.

Oh, good point. I think you're right.

agjohnson · 2018-01-23T10:51:11Z

readthedocs/projects/tasks.py

    """

    max_retries = 5
    default_retry_delay = (7 * 60)
    name = __name__ + '.update_docs'

+    # TODO: the argument from the __init__ are used only in tests


Perhaps some docs should exist on these? Optionally, we could use a separate pattern for this, such as a separate class method used just for testing. I have no strong opinions here though, just throwing out some options.

agjohnson · 2018-01-23T10:51:57Z

readthedocs/projects/tasks.py

        """
-        Run a documentation build.
+        Run a documentation sync or sync n' build.


What does sync mean in this context?

sync means: "update docs from VCS (sync tags/branches and trigger stable build if the stable version has changed)"

Where sync is not a good summary for that :(

Roger, this makes sense.

We do describe a lot of things as syncing (we also overuse the words backend and environment). This doesn't need to change though, I just was looking for some clarification to help understand the changes.

agjohnson · 2018-01-23T10:54:49Z

readthedocs/projects/tasks.py

@@ -149,9 +145,15 @@ def run(self, pk, version_pk=None, build_pk=None, record=True,
            self.build_force = force
            self.config = None

+            if sync_only:
+                self.sync_repo()
+                return True


Regarding my note above on the significance of sync, is there a strong reason to make this task conditionally operate like this? It seems like this should be a separate task instead?

If the purpose of this is code reuse, perhaps a mixin class that two tasks can share would make more sense.

If the sync operation requires the same amount of arguments/etc, then it's probably not a strong case for task isolation.

This is something that I talked to @ericholscher, and I saw it very clear at the beginning while we were talking, but the implementation is not as good as I imagined.

I wanted to avoid the needing of handling the LocalEnvironment inside the update_imported_docs and also try to avoid the "jump to a middle step of the flow": update_imported_docs is like the third step of the whole UpdateDocsTask and was used from outside also -I didn't like that.

Besides, the use of this sync_only from outside was done only from https://github.com/rtfd/readthedocs.org/blob/7f6c098983acb4bb3afc90e5c780af7dae9ec9d8/readthedocs/core/views/hooks.py#L106 and that endpoint is deprecated. So, I'd say that we shouldn't be calling this anymore in the future.

Regarding your proposals and considering what I wrote on this comment, maybe a Mixin class from where two different Task class inherit is the best approach: we share code, it's cleaner and easy to refactor/remove when finally we don't use the sync task alone anymore.

I will work on this implementation and see how it goes, but it should work :)

I did a commit with this change at c79f9f1

Let me know if it's clearer for you now.

Looks good! Another option could be to start using celery's builtin task chaining. Without looking deeply into this I don't know if this would clear up any confusion around the patterns we're using here. For now this accomplishes what we want however, perhaps moving to more common celery patterns can be a later batch of work.

agjohnson · 2018-01-23T11:00:47Z

readthedocs/projects/tasks.py

+            try:
+                # Hit the API ``sync_versions`` which may trigger a new build
+                # for the stable version
+                api_v2.project(self.project.pk).sync_versions.post(version_post_data)


I waffle on making this API driven, or just using executing this as a broadcast task directly. It seems calling the API is an unnecessary level of abstraction. Do you have any strong opinions here?

No, I don't.

I didn't write this code, I moved from update_imported_docs and I don't fully understand it --I just added the comment :) So, I don't have a strong opinion on this.

I'm not sure if there is historical significance here either. We should probably just address in a separate issue.

agjohnson · 2018-01-23T11:05:36Z

readthedocs/vcs_support/base.py

+        environment = os.environ.copy()
+
+        # TODO: kind of a hack
+        del environment['PATH']


Probably best anyways? I think we want tight control over this.

humitos · 2018-01-23T13:33:15Z

readthedocs/doc_builder/environments.py

+            # exception?  This will cause the first `git status` (when
+            # importing) to fail and be marked with RED in the Build command
+            # details
+            self.record_command(build_cmd)


I don't really like that because to record a command we need to be inside a build, but this LocalEnvironment knows nothings about a build. So, for this class I override it with just a pass and the *BuildEnvironment has its own to really record the commands

humitos · 2018-01-23T16:56:10Z

readthedocs/doc_builder/environments.py

+        self.commands.append(build_cmd)
+        build_cmd.run()
+
+        if record and not warn_only:


Not saving the warn_only commands involves another problem: "latex command are not being recorded" :(

The use of warn_only should produce two outcomes for commands:

warn_only=False means that commands that fail will raise exceptions and halt the build

warn_only=True means that commands that fail will be recorded as failing in the database, but an exception won't be raised. The build will continue after this.

So, I don't think we want warn_only to imply that a task is not recorded, record=False should be used explicitly if we want a command that can fail and doesn't need to be recorded.

humitos · 2018-01-24T16:16:19Z

All the commands logged with the latest changes in this branch:

humitos · 2018-01-30T21:41:27Z

In case we want to apply the same method that I used at #3560 (applying the styling in another PR to simplify the review), I wrote this command to do that in an easier way:

pre-commit run --files `git diff --name-only master $(git branch | grep \* | cut -d ' ' -f2)`

It runs pre-commit only in the files that where modified in the current branch.

agjohnson

Looks great! I'm going to give this some QA before merge, but I just raised a couple of questions on some of the notes we left in code. We can probably update this to make sure our future selves know exactly what needs to change.

agjohnson · 2018-01-24T05:35:12Z

readthedocs/doc_builder/environments.py

+        """
+        if record is None:
+            # ``self.record`` only exists when called from ``*BuildEnvironment``
+            record = getattr(self, 'record', False)


Raised in another comment of mine, it seems we probably want to respect the class record if it is False.

If we do that, once you initialize the class with False there is no way to change the behaviour for a specific command. I supposed that we want to override the default value for a specific command.

This would be confusing from my point of view:

initialize the class with False

call this method with record=True

it won't be recorded :/

To me, it's the same (opposite) case as "initialize the class with True but this and this command I don't want to record, so I call it False"

agjohnson · 2018-02-14T18:51:35Z

readthedocs/doc_builder/environments.py

@@ -310,7 +416,9 @@ def __init__(self, project=None, version=None, build=None, config=None,
        self.environment = environment or {}
        self.update_on_success = update_on_success

+        # TODO: remove this one, comes from super


Anything blocking this change?

agjohnson · 2018-02-16T14:55:49Z

readthedocs/doc_builder/environments.py

+        if record:
+            # TODO: I don't like how it's handled this entry point here since
+            # this class should know nothing about a BuildCommand (which are the
+            # only ones that can be saved/recorded)


I'm pretty sure we discussed this already, so apologies if we're rehashing this :)

What would be a better pattern to use here? Or is there something more specific we can assign to ourselves with this TODO?

Well, we don't have an strong opinion in how to improve this but I was thinking on something like how the signal works: pre_something, post_something, etc. The behaviour will be the same, but it will be correctly named instead of using record for something that know nothing about recording the commands.

So, we could call methods like pre_run_command and post_run_command so inherited classes can do whatever they want (in this example, the post_run_command will record it)

What do you think? In case you like it, another PR or the same one?

Oh, interesting thought, I do like the plan. I don't think we need to do this now, but we could always turn this todo into a ticket. I have a "Refactoring" milestone that we can assign this to if so -- this would be a great place for people to jump in and clean up code.

These are two separated tasks that share some code by inheriting from SyncRepositoryMixin. The final goal (as ``core.views.hooks._build_url`` is *DEPRECATED*) is to finally remove the SyncRepositoryTask and merge that code into UpdateDocsTask.

Either `record=False` or `warn_only=True` commands are not recorded. This attributes can be set at initialization time or when calling the `run()` method.

We need the real exit_code for some command since there are decisions based on that code, but we want to record it as success. So, the fake exit_code is saved in the database but the real exit_code is used in the whole flow of the building process.

humitos commented Jan 15, 2018

View reviewed changes

ericholscher reviewed Jan 16, 2018

View reviewed changes

agjohnson reviewed Jan 16, 2018

View reviewed changes

humitos mentioned this pull request Jan 17, 2018

Old webhooks gives 502 #3527

Closed

humitos commented Jan 19, 2018

View reviewed changes

RichardLitt added the PR: work in progress Pull request is not ready for full review label Jan 22, 2018

agjohnson mentioned this pull request Jan 23, 2018

Remove yet another command execution pattern #3540

Closed

agjohnson reviewed Jan 23, 2018

View reviewed changes

humitos commented Jan 23, 2018

View reviewed changes

This was referenced Jan 23, 2018

Refactor BuildCommand and move outside doc_builder/ #3541

Closed

Refactor run command outside a Build and Environment #3542

Closed

humitos commented Jan 23, 2018

View reviewed changes

humitos force-pushed the humitos/vcs/log-checkout branch from 67c74fa to d83515f Compare January 29, 2018 16:17

humitos added PR: ready for review and removed PR: work in progress Pull request is not ready for full review labels Jan 30, 2018

agjohnson approved these changes Feb 16, 2018

View reviewed changes

Log git checkout and expose to users

d0802ab

humitos added 23 commits February 16, 2018 11:25

Split LocalEnvironment into LocalBuildEnvironment and LocalEnvironment

e807a24

LocalEnvironment

fdca1f5

env fix

ae7ad79

Run VCS command in the proper environment

ea547d8

Define env

6247906

Proper env variables

2d012fd

Pass proper env variables when running VCS commands

fd13345

warn_only in status

9b64427

Record command when inside a BuildEnvironment

390924d

Remove --quiet options

80c964c

Do not record warn_only commands

c35b44b

Refactor

eab2f96

Add warn_only to other VCS

66094e5

Remove update_imported_docs and use UpdateDocsTask

2fb632b

Hack the Environment logs to avoid fails because no version

d209ec3

Allow to not record specific command

f7e15aa

Either `record=False` or `warn_only=True` commands are not recorded. This attributes can be set at initialization time or when calling the `run()` method.

Proper use of record, warn_only and the new force_success

d833b2d

Replace force_success to record_as_success in calls

7b34881

Lint

036d460

Fix docs styling

8c1fd17

Remove unnecessary initializing variables

30b7cef

humitos force-pushed the humitos/vcs/log-checkout branch from 83d2c26 to 30b7cef Compare February 16, 2018 16:31

This was referenced Feb 16, 2018

Allow skipping submodule clone #3215

Closed

Conditionally do git submodule operations #3627

Closed

agjohnson merged commit 3c34456 into master Feb 16, 2018

agjohnson deleted the humitos/vcs/log-checkout branch February 16, 2018 23:41

humitos mentioned this pull request Feb 19, 2018

Fix SVN initialization for command logging #3638

Merged

stsewd mentioned this pull request Dec 7, 2018

Refactor BaseEnvironment #3687

Closed

		@@ -173,7 +189,7 @@ def checkout(self, identifier=None):
		self.fetch()


		class LocalEnvironment(BaseEnvironment):

		# TODO: BuildCommand name doesn't make sense here, should be just Command

Log git checkout and expose to users #3520

Log git checkout and expose to users #3520

Conversation

humitos commented Jan 15, 2018 • edited Loading

humitos commented Jan 15, 2018

Choose a reason for hiding this comment

ericholscher commented Jan 16, 2018

ericholscher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

humitos Jan 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Solution 1

Solution 2

Choose a reason for hiding this comment

ericholscher Jan 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agjohnson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agjohnson commented Jan 16, 2018

ericholscher commented Jan 16, 2018

humitos commented Jan 17, 2018

humitos commented Jan 18, 2018

humitos commented Jan 18, 2018 • edited Loading

humitos commented Jan 18, 2018

humitos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agjohnson commented Jan 22, 2018

humitos commented Jan 22, 2018

agjohnson commented Jan 23, 2018

agjohnson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

humitos Jan 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Log `git checkout` and expose to users #3520

Log `git checkout` and expose to users #3520

humitos commented Jan 15, 2018 •

edited

Loading

humitos Jan 16, 2018 •

edited

Loading

ericholscher Jan 16, 2018 •

edited

Loading

humitos commented Jan 18, 2018 •

edited

Loading

humitos Jan 23, 2018 •

edited

Loading