Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log git checkout and expose to users #3520

Merged
merged 26 commits into from
Feb 16, 2018
Merged

Conversation

humitos
Copy link
Member

@humitos humitos commented Jan 15, 2018

I've been taking a look at the code and I found some consideration that we need to take a look:

  1. BaseVCS class has it owns run method that works in a similar way than BuildEnvironment.run
    • the log_tmpl string differs
    • it adds some configurations automatically, like the cwd path
  2. All commands ran by BaseVCS are not logged as a BuildCommand
  3. BuildCommand needs to be instantiated with build_env to make be able to .save()/record the command for the current Build object.
  4. build_env could be LocalEnvironment or DockerEnvironment

Depending on how much we want to refactor, a minimalistic solution would involve:

  1. pass setup_env to update_imported_docs
  2. pass this env to version_repo.checkout
  3. under checkout consider use if that env if not None
  4. env.run with proper cwd

This is the what this PR implements but, this leaves a mix of calls to run method between BaseVCS and LocalEnvironment for the same purpose: update the repository's code. So, maybe we should consider to just pass the setup_env to the init of BaseVCS and execute everything under setup_env.

The problem of using always the setup_env (or maybe a new one called vcs_env) is that the record affects all the commands ran under that env and we want to record only clone and checkout commands --the run method could receive an optional record to override the default behaviour.

It's a good starting point to discuss a little about this.

This PR only affects to the git backend, that's why I think we need to think in a general solution under the BaseVCS class.

Closes #3397

@humitos
Copy link
Member Author

humitos commented Jan 15, 2018

Also, we may want to remove the --quiet option from the git command?

captura de pantalla_2018-01-15_15-58-05

...

captura de pantalla_2018-01-15_15-59-36

but if it fails...

captura de pantalla_2018-01-15_16-07-21

@@ -173,7 +189,7 @@ def checkout(self, identifier=None):
self.fetch()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably needs to pass the env here, also 😞

@ericholscher
Copy link
Member

I think we should be able to set a per-command option to choose whether to record that call ever. It seems like we haven't needed that in the past, but for the VCS stuff we'll want to have a subset of things that aren't recorded.

One question, is there a reason we really don't want all the VCS commands recorded? It might be extra info, but it doesn't seem inherently bad to show them.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove the BaseCLI:run() command from the VCS tools, and move to always just proxying those commands to the Environment:run(). We really shouldn't have two different ways of running commands in our build process, and the environments are a better abstraction.

@@ -593,7 +593,7 @@ def update_imported_docs(version_pk):
version_slug = version.slug
version_repo = project.vcs_repo(version_slug)

ret_dict['checkout'] = version_repo.checkout(version.identifier)
ret_dict['checkout'] = version_repo.checkout(version.identifier, env)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the setup_env here, or could we just create a new vcs_env inside checkout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we could pass it into the original call to project.vcs_repo(version_slug) in the line above, and stored on the repo object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not suggesting this btw, just thinking out loud. Is it worth having a vcs_env we use for all commands, or is passing in the setup_env correct? I'm not sure!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was to just use the setup_env. The difference between setup_env and build_env is just that one is executed on the host, and one in the vm. Separating this further might make for more places to check for env exception vs env failure vs env pass as well, so reusing setup_env ends up cleaner too.

Do you have a scenario that you think it might help to have a separate env?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think passing env into the VCS backend instantiation is the correct pattern. I envisioned completely removing the run() method on the vcs backends, as we shouldn't be duplicating code there. Perhaps there are some commands that are not important to the user to expose in the UI and we can not record those, but executing all commands through the setup_env is important for this feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setup_env is not used for VCS checkouts since they are using their own run. In fact, the function/task didn't receive the setup_env, I had to added to it.

That could be a problem because that task (update_imported_docs) is called from a couple of places where a setup_env doesn't exist (e.g. from core/views/hooks.py). This could be one of the reason of why a vcs_env could be a better approach: we could create it inside the Project.vcs_repo instead of passing around through all the code or forcing other calls to create this environment just to log the commands.

Copy link
Member Author

@humitos humitos Jan 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that we need to consider if we want to create a new vcs_env is that the envs are created with this dependencies:

  • self.project
  • self.version
  • self.config (from YAML)
  • self.build
  • record
  • env_vars. From get_env_vars, which depends on
    • self.version
    • self.project
    • self.config

In case we want to create the environment inside the Project.vcs_repo will need to consider this dependencies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at this, and I found that the only missing piece here is the build_pk (originally passed from core.utils.trigger_build after creating an Build object).

Solution 1

So, I'd say that this method update_imported_docs could receive the version_pk and build_pk (but when called from UpdateDocsTask we will be repeating a couple of actions like parsing the YAML and more) and create our own vcs_env for this. If build_pk is None we won't record (False) the commads since we don't know where to record them. This will allow other uses of this function to keep working.

Solution 2

Refactor all the things!

  • move update_imported_docs as a method of UpdateDocsTask
  • fix test_celery.py:test_update_imported_docs
  • refactor/remove core.view.hooks._build_url function that uses the function update_imported_docs
  • refactor/remove github_build, gitlab_build and bitbucket_build functions that use the _build_url but the docstrings say those functions are deprecated in favor of readthedocs.restapi.views.integrations classes (these classes use core.views.hooks._build_version which uses trigger_build which uses UpdateDocsTask --so we are safe)

I think the last solution is a better plan, but I may missing a lot of things in the middle :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We found that there are users using the old webhook endpoints. Only projects with Feature.ALLOW_DEPRECATED_WEBHOOK can hit these ones. But anyway, we can't remove them yet.

So, the refactor would include changing calls to projects.tasks.update_imported_docs,

https://github.com/rtfd/readthedocs.org/blob/445683fc0c86b0de56779447a8eedf98024dba32/readthedocs/core/views/hooks.py#L125-L129

by core.utils.trigger_build which will end up calling UpdateDocsTask in a proper way and "everything should keep working"

Copy link
Member

@ericholscher ericholscher Jan 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need think more about why we are calling into this function from other places, and build a supported API for that use case.

I believe all the other calls are so that we can re-sync the checkout on disk and the DB (to update versions on a push, for example). We should find a way to make that supported explicitly in the build tooling I think, so that we don't have to maintain a bunch of different logic. I think the best approach here is to have some kind of mechanism on the doc builds to say "Do a resync of all the on-disk stuff, but don't actually build the docs", which is mostly what your Solution 2 is talking about.

@@ -335,7 +335,7 @@ def setup_vcs(self):
self.setup_env.update_build(state=BUILD_STATE_CLONING)

self._log(msg='Updating docs from VCS')
update_imported_docs(self.version.pk)
update_imported_docs(self.version.pk, self.setup_env)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to see kwargs here. I'd love to get to a world where we're explicitly using kwargs in all function/method calls, at least until we can add type checking to the calls :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also just drop this arg. Other methods use the class local self.setup_env, so it might make sense to continue assuming this state exists.

I don't particularly like this pattern, but its at least consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_imported_docs is a function outside this UpdateDocsTask class (it's used from other places). See #3520 (comment)

Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output is a huge improvement, I'm really excited to have this output included and unified with the rest of the command output finally!

I'll echo the sentiment to continue cleaning up the vcs backend code and completely remove the secondary run() method. I'll also echo the uncertainty of displaying all the commands, could you post some screenshots of the commands all executed through the build environment?

@@ -335,7 +335,7 @@ def setup_vcs(self):
self.setup_env.update_build(state=BUILD_STATE_CLONING)

self._log(msg='Updating docs from VCS')
update_imported_docs(self.version.pk)
update_imported_docs(self.version.pk, self.setup_env)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also just drop this arg. Other methods use the class local self.setup_env, so it might make sense to continue assuming this state exists.

I don't particularly like this pattern, but its at least consistent.

@@ -593,7 +593,7 @@ def update_imported_docs(version_pk):
version_slug = version.slug
version_repo = project.vcs_repo(version_slug)

ret_dict['checkout'] = version_repo.checkout(version.identifier)
ret_dict['checkout'] = version_repo.checkout(version.identifier, env)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think passing env into the VCS backend instantiation is the correct pattern. I envisioned completely removing the run() method on the vcs backends, as we shouldn't be duplicating code there. Perhaps there are some commands that are not important to the user to expose in the UI and we can not record those, but executing all commands through the setup_env is important for this feature.

code, out, err = build_cmd.exit_code, build_cmd.output, build_cmd.error
else:
code, out, err = self.run('git', 'checkout',
'--force', '--quiet', revision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your screenshots, --quiet is not what we want it seems. The output without --quiet was really helpful. So 👍 on removing that from the command.

@agjohnson
Copy link
Contributor

One more thought:

build_env could be LocalEnvironment or DockerEnvironment

We probably won't want to support this. In the case of our commercial hosting, this step is specifically kept out of the Docker build environment so that the build environment has no read access to the source deploy keys.

@ericholscher
Copy link
Member

@agjohnson I'd really like to run the VCS calls inside Docker on the .org though, so I think having it be configurable by a setting or similar is good.

@humitos humitos mentioned this pull request Jan 17, 2018
@humitos
Copy link
Member Author

humitos commented Jan 17, 2018

I'll also echo the uncertainty of displaying all the commands, could you post some screenshots of the commands all executed through the build environment?

I'm working on running all the commands inside a BuildEnvironment. These are the commands executed in an already imported project (for example when a webhook comes) so far:

In [4]: vcs = Project.objects.get(slug='poliastro').vcs_repo()

In [5]: vcs.checkout()
[17/Jan/2018 15:01:55] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git status' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:55] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git remote set-url origin https://github.com/poliastro/poliastro' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:55] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git fetch --tags --prune' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git show-ref remotes/origin/master' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git checkout --force --quiet origin/master' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git clean -d -f -f' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git submodule sync' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git submodule update --init --recursive --force' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
Out[5]: (0, '', None)

@humitos
Copy link
Member Author

humitos commented Jan 18, 2018

I just updated this PR with several changes implementing the idea that I commented you both. This is not a final solution but some code to discuss over something concrete.

Summarizing, the idea is to have a BaseEnvironment which doesn't depend on a Build where we can run arbitrary commands: this is the case of some VCS command that we need to run outside a build.

This code will probably needs some polishing and arrangement between us, but it shows the idea I think.

@agjohnson screenshot for all the VCS commands recorded

captura de pantalla_2018-01-18_12-47-44

@humitos
Copy link
Member Author

humitos commented Jan 18, 2018

Initial import...

captura de pantalla_2018-01-18_12-55-32

Note that the first command failed, but it's under warn_only so I think we shouldn't record those ones to avoid this problems.

@humitos
Copy link
Member Author

humitos commented Jan 18, 2018

Just imported repo without logging the warn_only commands:

captura de pantalla_2018-01-18_12-59-38

Copy link
Member Author

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments where I have doubts or where I wanted to explain a little more about that change.

Anyway, this code still need work for sure but I will wait for your comments before continue working on this.

cc @ericholscher @agjohnson

def _log_warn_only(self, msg):
log.warn(LOG_TEMPLATE.format(
project=self.project.slug,
version='latest',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this, but I don't want to depend on a version instance to just get the slug. I would like to think in a better way to implement it, though

# :'(
log.warn(LOG_TEMPLATE.format(
project=self.project.slug,
version=self.version.slug,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one can use the version object because the *BuildEnvironment are instantiated with this object. No real problem here.

@@ -163,7 +163,7 @@ def save_environment_json(self):
with open(self.environment_json_path(), 'w') as fpath:
# Compatibility for Py2 and Py3. ``io.TextIOWrapper`` expects
# unicode but ``json.dumps`` returns str in Py2.
fpath.write(unicode(json.dumps(data)))
fpath.write(json.dumps(data))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone proposed to use six.text_type, but I didn't try it yet

"""

max_retries = 5
default_retry_delay = (7 * 60)
name = __name__ + '.update_docs'

# TODO: the argument from the __init__ are used only in tests
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted this because I found these non-required attributes kind of confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps some docs should exist on these? Optionally, we could use a separate pattern for this, such as a separate class method used just for testing. I have no strong opinions here though, just throwing out some options.

version_repo = self.project.vcs_repo(
self.version.slug,
# When ``sync_only`` we don't a setup_env
getattr(self, 'setup_env', None),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also kind of hacky: sync_repo can be executed before (sync_only=True) or after the run_setup (the one that creates the self.setup_env we want to use here in case it exists)

@@ -21,7 +22,7 @@ class Backend(BaseVCS):

def update(self):
super(Backend, self).update()
retcode = self.run('bzr', 'status')[0]
retcode = self.run('bzr', 'status', warn_only=True)[0]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command is used to check the existence of the repository and call pull or clone for example.

@@ -66,21 +66,21 @@ def checkout_revision(self, revision=None):
revision = 'origin/%s' % branch

code, out, err = self.run('git', 'checkout',
'--force', '--quiet', revision)
'--force', revision)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are exposing the output to the user, I removed all the quiet arguments.

@@ -23,7 +24,7 @@ def pull(self):
(pull_retcode, _, _) = self.run('hg', 'pull')
if pull_retcode != 0:
raise RepositoryError
(update_retcode, stdout, stderr) = self.run('hg', 'update', '-C')
(update_retcode, stdout, stderr) = self.run('hg', 'update', '--clean')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expand the arguments to use the long form which is clearer.


"""
Base for VCS Classes.

Built on top of the BaseCLI.
VCS commands are ran inside a ``LocalEnvironment``.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, could be any Environment not just Local. By default, is LocalEnvironment

environment = os.environ.copy()

# TODO: kind of a hack
del environment['PATH']
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to delete this one because there is an assert in the code that doesn't allow the PATH in the environment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best anyways? I think we want tight control over this.

@agjohnson
Copy link
Contributor

Now that I've seen how many commands are raised to the user, I feel like there is perhaps some filtering of commands that we need to do. I think it's important to reduce redundancy in the implementations and use a central execution pattern, but it probably isn't useful for users to see all of these commands.

For instance, if we do a git status to check for a repository, this shouldn't be surfaced to the user, and it shouldn't show as a failed command, as it isn't important from the user's perspective. Commands like git checkout and git subodule ... are important though.

This will probably also get around the issue that a command like git status will normally fail and execution will continue regardless. We don't need to show this as a failure to the user.

@humitos
Copy link
Member Author

humitos commented Jan 22, 2018

To add another place where we are running commands outside a Build and even outside an Environment, I just found this:

https://github.com/rtfd/readthedocs.org/blob/1af444173481df9a7b16e015fc6b12abe8155e7e/readthedocs/projects/utils.py#L48-L107

(used for symlinks, for example)

@RichardLitt RichardLitt added the PR: work in progress Pull request is not ready for full review label Jan 22, 2018
@agjohnson
Copy link
Contributor

Yeah, that's the other one that needs to be unified. Let's address that separately though. I'll open a ticket for that.

Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I could use some more explanation around moving the update docs task into the main build task, as it seems like a conditional use of the main task. Maybe all that is required here is an abstraction though.

We already discussed the addition of a record on run(), but i agree the verbose cli options is a necessary addition for all of the vcs commands. It makes the user view more obvious and makes it easier for development as well.


class LocalEnvironment(BaseEnvironment):

# TODO: BuildCommand name doesn't make sense here, should be just Command
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, this meaning has changed now. 👍 on changing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, maybe this is a later refactor after all. If we generalize the naming, we should probably also move out of doc_builder/ app.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened a ticket for this: #3541

kwargs['environment'] = self.environment

# TODO: ``build_env`` is passed as ``kwargs`` when it's called from a ``*BuildEnvironment``
# kwargs['build_env'] = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow this comment, but I assume this note makes more sense for you :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to leave that comment to discuss a little about this.

build_env was self, but now if it comes, it's from the overriden run method from the class that is calling this method.

Sounds a little elaborated, that's why I'd appreciate your thoughts on this.

self.commands.append(build_cmd)
build_cmd.run()

# TODO: I don't like how it's handled this entry point here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your concern here? I can see how warn_only would be confusing as we're now considering several different return states:

  • Command passed, report
  • Command failed, report
  • Command failed, report and fail the build
  • Command passed, don't report
  • Command failed, don't report
  • Command failed, don't report but fail the build (this probably shouldn't be used ever, but maybe there are steps that should raise a general error?)
  • Command failed, but report as pass? (I don't know if this still exists though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO comment was moved accidentally. It belongs to this line: https://github.com/rtfd/readthedocs.org/pull/3520/files/7f6c098983acb4bb3afc90e5c780af7dae9ec9d8#r163242699

I agree with the return states, but I'd say that the last two shouldn't exist at all. The rest, are needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command failed, but report as pass? (I don't know if this still exists though)

This is tricky, but it does exist. Although, we are not using it but maybe we need it.

For example, hg tags command. I suppose that we want to record=True it but we want to warn_only=True also, since it could fail but we don't want the Build to fail. Besides, if it's recorded and failed (exit status != 0) it will be presented in red to the user and we don't want that.

Is that a good example of this use case?

Copy link
Member Author

@humitos humitos Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sumarizing, I think we want the warn_only commands to be recorded but don't interfers in the build status nor in the command itself (when presented to the user).

Maybe what we need to change is the BuildCommandResultMixin.successful property to something like:

return self.exit_code == 0 and self.warn_only == False

but that would involve to save the warn_only in the database also and I don't know if this is getting more and more complicated :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible idea is that if warn_only we record it, but we set the exit_code as 0 (this could end up with other problems like the Latex build fail and we will never know it, even when opening the Build interface since it won't be marked as failed -in red).

So, add another attribute? warn_only and force_success 😛 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Its sounding like we also want a force_success option as well here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, I just realized that we need something like record_as_success instead of force_success because we are using git status exit code to decide if clone or pull, so we need the real exit code in the flow but we want to save a fake one in the databse...

if not warn_only:
# TODO: maybe receive ``record`` as an attribute for skip/record
# just specific commands but not all of them ran under the
# *BuildEnvironment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note sounds like we want a record kwarg on command execution? I don't quite understand the comment on *BuildEnvironment though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe.

Yes, I was thinking on having the record attribute passed in the run method to override the one from the __init__ from *BuildEnvironment classes (now we have LocalEnvironment and (Local/Docker)BuildEnvironment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i think a record arg on run() makes sense. Given __init__(record=False, ...), and run(record=True, ...), what do think we should do? I'd probably say don't record if __init__(record=False), thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to follow what run(record=) is set independently how was the class initialized.

Why? Because sometimes the LocalEnvironment is initialized far away (in another file and just passed as argument) from the code where the command is ran and you want/need to override the default record.

I implemented this at https://github.com/rtfd/readthedocs.org/blob/e1766a4d1061f7e97c75039b84a196088b111164/readthedocs/doc_builder/environments.py#L310-L312

# TODO: do we want to save commands that FAILED but not raised and
# exception? This will cause the first `git status` (when
# importing) to fail and be marked with RED in the Build command
# details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised in general review of this, I'd say having a record option that wouldn't record this command at all is a better pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need both: record=False and warn_only=True In fact, I'm thinking that record=False should implicit mean warn_only=True because the False case doesn't make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good point. I think you're right.

"""

max_retries = 5
default_retry_delay = (7 * 60)
name = __name__ + '.update_docs'

# TODO: the argument from the __init__ are used only in tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps some docs should exist on these? Optionally, we could use a separate pattern for this, such as a separate class method used just for testing. I have no strong opinions here though, just throwing out some options.

"""
Run a documentation build.
Run a documentation sync or sync n' build.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does sync mean in this context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync means: "update docs from VCS (sync tags/branches and trigger stable build if the stable version has changed)"

Where sync is not a good summary for that :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger, this makes sense.

We do describe a lot of things as syncing (we also overuse the words backend and environment). This doesn't need to change though, I just was looking for some clarification to help understand the changes.

@@ -149,9 +145,15 @@ def run(self, pk, version_pk=None, build_pk=None, record=True,
self.build_force = force
self.config = None

if sync_only:
self.sync_repo()
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding my note above on the significance of sync, is there a strong reason to make this task conditionally operate like this? It seems like this should be a separate task instead?

If the purpose of this is code reuse, perhaps a mixin class that two tasks can share would make more sense.

If the sync operation requires the same amount of arguments/etc, then it's probably not a strong case for task isolation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that I talked to @ericholscher, and I saw it very clear at the beginning while we were talking, but the implementation is not as good as I imagined.

I wanted to avoid the needing of handling the LocalEnvironment inside the update_imported_docs and also try to avoid the "jump to a middle step of the flow": update_imported_docs is like the third step of the whole UpdateDocsTask and was used from outside also -I didn't like that.

Besides, the use of this sync_only from outside was done only from https://github.com/rtfd/readthedocs.org/blob/7f6c098983acb4bb3afc90e5c780af7dae9ec9d8/readthedocs/core/views/hooks.py#L106 and that endpoint is deprecated. So, I'd say that we shouldn't be calling this anymore in the future.


Regarding your proposals and considering what I wrote on this comment, maybe a Mixin class from where two different Task class inherit is the best approach: we share code, it's cleaner and easy to refactor/remove when finally we don't use the sync task alone anymore.

I will work on this implementation and see how it goes, but it should work :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a commit with this change at c79f9f1

Let me know if it's clearer for you now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Another option could be to start using celery's builtin task chaining. Without looking deeply into this I don't know if this would clear up any confusion around the patterns we're using here. For now this accomplishes what we want however, perhaps moving to more common celery patterns can be a later batch of work.

try:
# Hit the API ``sync_versions`` which may trigger a new build
# for the stable version
api_v2.project(self.project.pk).sync_versions.post(version_post_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I waffle on making this API driven, or just using executing this as a broadcast task directly. It seems calling the API is an unnecessary level of abstraction. Do you have any strong opinions here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't.

I didn't write this code, I moved from update_imported_docs and I don't fully understand it --I just added the comment :) So, I don't have a strong opinion on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there is historical significance here either. We should probably just address in a separate issue.

environment = os.environ.copy()

# TODO: kind of a hack
del environment['PATH']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best anyways? I think we want tight control over this.

# exception? This will cause the first `git status` (when
# importing) to fail and be marked with RED in the Build command
# details
self.record_command(build_cmd)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like that because to record a command we need to be inside a build, but this LocalEnvironment knows nothings about a build. So, for this class I override it with just a pass and the *BuildEnvironment has its own to really record the commands

self.commands.append(build_cmd)
build_cmd.run()

if record and not warn_only:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saving the warn_only commands involves another problem: "latex command are not being recorded" :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of warn_only should produce two outcomes for commands:

  • warn_only=False means that commands that fail will raise exceptions and halt the build
  • warn_only=True means that commands that fail will be recorded as failing in the database, but an exception won't be raised. The build will continue after this.

So, I don't think we want warn_only to imply that a task is not recorded, record=False should be used explicitly if we want a command that can fail and doesn't need to be recorded.

@humitos
Copy link
Member Author

humitos commented Jan 24, 2018

All the commands logged with the latest changes in this branch:

captura de pantalla_2018-01-24_11-15-51

@humitos humitos added PR: ready for review and removed PR: work in progress Pull request is not ready for full review labels Jan 30, 2018
@humitos
Copy link
Member Author

humitos commented Jan 30, 2018

In case we want to apply the same method that I used at #3560 (applying the styling in another PR to simplify the review), I wrote this command to do that in an easier way:

pre-commit run --files `git diff --name-only master $(git branch | grep \* | cut -d ' ' -f2)`

It runs pre-commit only in the files that where modified in the current branch.

Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I'm going to give this some QA before merge, but I just raised a couple of questions on some of the notes we left in code. We can probably update this to make sure our future selves know exactly what needs to change.

"""
if record is None:
# ``self.record`` only exists when called from ``*BuildEnvironment``
record = getattr(self, 'record', False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised in another comment of mine, it seems we probably want to respect the class record if it is False.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that, once you initialize the class with False there is no way to change the behaviour for a specific command. I supposed that we want to override the default value for a specific command.

This would be confusing from my point of view:

  1. initialize the class with False
  2. call this method with record=True
  3. it won't be recorded :/

To me, it's the same (opposite) case as "initialize the class with True but this and this command I don't want to record, so I call it False"

@@ -310,7 +416,9 @@ def __init__(self, project=None, version=None, build=None, config=None,
self.environment = environment or {}
self.update_on_success = update_on_success

# TODO: remove this one, comes from super
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything blocking this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Done

if record:
# TODO: I don't like how it's handled this entry point here since
# this class should know nothing about a BuildCommand (which are the
# only ones that can be saved/recorded)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we discussed this already, so apologies if we're rehashing this :)

What would be a better pattern to use here? Or is there something more specific we can assign to ourselves with this TODO?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we don't have an strong opinion in how to improve this but I was thinking on something like how the signal works: pre_something, post_something, etc. The behaviour will be the same, but it will be correctly named instead of using record for something that know nothing about recording the commands.

So, we could call methods like pre_run_command and post_run_command so inherited classes can do whatever they want (in this example, the post_run_command will record it)

What do you think? In case you like it, another PR or the same one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting thought, I do like the plan. I don't think we need to do this now, but we could always turn this todo into a ticket. I have a "Refactoring" milestone that we can assign this to if so -- this would be a great place for people to jump in and clean up code.

These are two separated tasks that share some code by inheriting from
SyncRepositoryMixin.

The final goal (as ``core.views.hooks._build_url`` is *DEPRECATED*) is
to finally remove the SyncRepositoryTask and merge that code into
UpdateDocsTask.
Either `record=False` or `warn_only=True` commands are not
recorded. This attributes can be set at initialization time or when
calling the `run()` method.
We need the real exit_code for some command since there are decisions
based on that code, but we want to record it as success.

So, the fake exit_code is saved in the database but the real exit_code
is used in the whole flow of the building process.
@agjohnson agjohnson merged commit 3c34456 into master Feb 16, 2018
@agjohnson agjohnson deleted the humitos/vcs/log-checkout branch February 16, 2018 23:41
@stsewd stsewd mentioned this pull request Dec 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants