Added a `remotecat` command for printing remote file of running calcjobs #4861

zhubonan · 2021-04-17T15:04:11Z

This is a command that I found useful - perhaps it is good to commit it to the core.
It can be useful for monitoring the calcjob while they are running.

I have not yet able to add tests for it - the existing test framework for calcjob commands does not include remote contents (the nodes are extracted from an archive). Not sure how to make the test work, perhaps the only way is to mock the getfile method of the RemoteData node?

ramirezfranciscof · 2021-04-19T09:33:12Z

Hey @zhubonan , thanks for the contribution! We are currently doing/discussing some changes in how we are handling the connections for external resources (in particular, ssh connections), so we need to consider how this feature fits into all that. Given that it is a command line feature, it might still be useful to have something like it but perhaps the underlying workings might need to be different.

I'm pinging @sphuber and @giovannipizzi if they can take a look. Anyways, I'll try to bring the subject up on the next AiiDA meeting (it is this wednesday at 15:00 CEST, these are open for participation so let me know if you want to be present).

zhubonan · 2021-04-19T09:37:54Z

I see. This is just a small feature that I think might be useful. It does not touch the ssh interface though (only RemoteData.getfile is used).

Feel free to come back to this later.

mbercx · 2021-04-19T09:56:13Z

+1 I'd definitely use this feature. I'm wondering if it can also be extended to monitor the calculation. I typically do this with the tail command, after doing verdi calcjob gotocomputer:

$ tail -f aiida.out

It would be nice to just be able to do e.g.:

verdi calcjob remotecat --monitor <PK>

ltalirz · 2021-04-19T10:19:57Z

Hi @zhubonan , thanks for this contribution!

have not yet able to add tests for it - the existing test framework for calcjob commands does not include remote contents (the nodes are extracted from an archive).

You could add a test to the "system tests", some of which run calculations on a slurm docker container on github actions.
See e.g.

aiida-core/.github/system_tests/pytest/test_memory_leaks.py

Lines 48 to 62 in fa8b243

    
           def test_leak_ssh_calcjob(): 
        
               """Test whether running a CalcJob over SSH leaks memory. 
        
               Note: This relies on the 'slurm-ssh' computer being set up. 
        
               """ 
        
               code = orm.Code( 
        
                   input_plugin_name='arithmetic.add', remote_computer_exec=[orm.load_computer('slurm-ssh'), '/bin/bash'] 
        
               ) 
        
               inputs = {'x': orm.Int(1), 'y': orm.Int(2), 'code': code} 
        
               run_finished_ok(ArithmeticAddCalculation, **inputs) 
        
               # check that no reference to the process is left in memory 
        
               # some delay is necessary in order to allow for all callbacks to finish 
        
               process_instances = get_instances(processes.Process, delay=0.2) 
        
               assert not process_instances, f'Memory leak: process instances remain in memory: {process_instances}'

I might not be the right person to review this PR but we'll discuss it at the meeting

ramirezfranciscof · 2021-04-19T10:30:25Z

It does not touch the ssh interface though (only RemoteData.getfile is used).

Right, so this is one of the things. The RemoteData methods open and close the ssh connection with the external resource every time you call them. We are considering changing this so that you need to first open the transport yourself and then pass the open connection to the RemoteData method you want to use (this is slightly more complicated but would allow users to systematically copy files in a more efficient way that may currently be prohibiting for some use cases). This might require some adaptation of this feature; I just want to make sure we are considering all angles before going through since this is related to a part of the code that is being re-designed.

sphuber · 2021-04-19T12:04:14Z

I think that in principle it would be ok to add this command even though we are not yet sure of how the transports may change, since if and when we change it, it should not affect the interface and so the user shouldn't be affected.

That being said: in a sense this command already exists in verdi data remote cat. Now I realize that this command provides some shortcuts as in it automatically gets the PK of the remote_folder and it can get the default output name if the plugin specifies it. So there might be an argument to add this anyway, but then i would consider having the implementation be as close to the other as possible, or at least use the implementation under the hood.

zhubonan · 2021-04-19T13:10:30Z

That being said: in a sense this command already exists in verdi data remote cat.
@sphuber I see. I will reflector the code to use the implement that verdi data remote cat uses.

+1 I'd definitely use this feature. I'm wondering if it can also be extended to monitor the calculation. I typically do this with the tail command, after doing verdi calcjob gotocomputer:

@mbercx I have coded this up as well actually based on gotocomputer. The RemoteData interface does not allow tracking a file easily. I will see if I can add it to the PR as well.

codecov · 2021-04-19T13:55:24Z

Codecov Report

Base: 79.69% // Head: 80.10% // Increases project coverage by +0.42% 🎉

Coverage data is based on head (9082559) compared to base (e5cfd84).
Patch coverage: 64.29% of modified lines in pull request are covered.

❗ Current head 9082559 differs from pull request most recent head 412964f. Consider uploading reports for the commit 412964f to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4861      +/-   ##
==========================================
+ Coverage   79.69%   80.10%   +0.42%     
==========================================
  Files         545      515      -30     
  Lines       39150    36742    -2408     
==========================================
- Hits        31195    29427    -1768     
+ Misses       7955     7315     -640

Flag	Coverage Δ
django	`74.58% <64.29%> (?)`
sqlalchemy	`73.50% <64.29%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
aiida/cmdline/commands/cmd_calcjob.py	`66.50% <60.00%> (-13.37%)`	⬇️
aiida/transports/plugins/local.py	`81.41% <100.00%> (-0.93%)`	⬇️
aiida/transports/plugins/ssh.py	`80.14% <100.00%> (+0.51%)`	⬆️
aiida/transports/transport.py	`66.22% <100.00%> (-17.19%)`	⬇️
aiida/engine/exceptions.py	`0.00% <0.00%> (-100.00%)`	⬇️
aiida/cmdline/utils/ascii_vis.py	`0.00% <0.00%> (-59.67%)`	⬇️
aiida/cmdline/commands/cmd_setup.py	`50.00% <0.00%> (-45.19%)`	⬇️
aiida/orm/nodes/data/array/xy.py	`16.18% <0.00%> (-43.82%)`	⬇️
aiida/cmdline/params/options/commands/setup.py	`56.85% <0.00%> (-36.22%)`	⬇️
aiida/cmdline/commands/cmd_devel.py	`31.58% <0.00%> (-35.74%)`	⬇️
... and 713 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

zhubonan · 2021-06-15T22:29:54Z

@ltalirz @sphuber I have updated the code and added a test. This PR is ready to be merged now.

chrisjsewell · 2021-06-16T06:34:50Z

Hey thanks @zhubonan, I'm seeing a lot of lines not covered by tests 😬 would it be possible to look into increasing the test coverage for these scenarios?

chrisjsewell

increase test coverage?

zhubonan · 2021-06-22T13:21:11Z

@chrisjsewell I tried to improve the coverages but it turns out additional changes are needed to support the --monitor option for multiple transport plugins.
I have tweaked the gotocomputer_command method to allow additional arguments to be passed to facilitate running tail -f <filename>. However, this command cannot be tested directly as the os.system will never return I think, just like that for the standard gotocomputercommand.

sphuber · 2022-12-13T16:49:38Z

@zhubonan I have rebased the PR to the latest main branch and addressed the merge conflicts. I have also added a commit with some minor improvements. It includes a test for the --monitor option so this should increase the coverage (although I am not sure this is even running on PRs anymore 🤔 )

In looking into the code more closely, I have just a question of (ab)using the Transport.gotocomputer_command. Executing arbitrary commands clearly wasn't the intention. It seems we should rather maybe use something like Transport.exec_command_wait except that won't work in the case of tail -f since it is blocking and we don't want to wait for it to finish. Still I am wondering if we should add a Transport.exec_command instead that just works blockingly and pipes output through. The reason I am a bit hesitant to just use your approach is that we are considering to refactor the Transport interface, since this might become necessary for 2FA that is starting to become common-place and the interface mixes file transport and remote command execution. Not sure if this change would make the redesign more complicated.

zhubonan · 2022-12-13T18:05:29Z

@sphuber Thanks for looking into it. If there is a Transport refecrtoring going to happen probably it does not make sense to modify the interface now to add the --monitor functionality.

Perhaps we can just remove the --monior option for now? It is only a nice to have addition I think.

sphuber · 2022-12-14T13:40:37Z

Perhaps we can just remove the --monior option for now? It is only a nice to have addition I think.

Works for me. Let me create a backup branch for you on your fork and then I will remove the option and we can merge this.

This can be usedful for monitoring the calcjobs while they are running.

Also adds a test for the `--monitor` option.

sphuber

Thanks @zhubonan

sphuber · 2022-12-14T15:38:52Z

@zhubonan I cannot seem to push the branch with the monitor option code to your remote. But I will add the patch here for posterity:

From 02d9fa15a508cc15513f64387adc7171e16d77cb Mon Sep 17 00:00:00 2001
From: Bonan Zhu <[email protected]>
Date: Sat, 17 Apr 2021 15:57:27 +0100
Subject: [PATCH] Added a remotecat command for printing remote file

This can be usedful for monitoring the calcjobs while they are
running.

Various minor improvements

Also adds a test for the `--monitor` option.
---
 aiida/cmdline/commands/cmd_calcjob.py  | 20 +++++++++++++++++++-
 aiida/transports/plugins/local.py      |  4 ++--
 aiida/transports/plugins/ssh.py        |  4 ++--
 aiida/transports/transport.py          | 12 ++++++++----
 tests/cmdline/commands/test_calcjob.py | 11 ++++++++++-
 tests/transports/test_local.py         |  9 +++++++++
 tests/transports/test_ssh.py           |  9 +++++++++
 7 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/aiida/cmdline/commands/cmd_calcjob.py b/aiida/cmdline/commands/cmd_calcjob.py
index 72cec957f..da542942f 100644
--- a/aiida/cmdline/commands/cmd_calcjob.py
+++ b/aiida/cmdline/commands/cmd_calcjob.py
@@ -124,8 +124,9 @@ def calcjob_inputcat(calcjob, path):
 @verdi_calcjob.command('remotecat')
 @arguments.CALCULATION('calcjob', type=CalculationParamType(sub_classes=('aiida.node:process.calculation.calcjob',)))
 @click.argument('path', type=str, required=False)
+@click.option('--monitor', is_flag=True, default=False, help='Monitor the file using `tail -f` instead.')
 @decorators.with_dbenv()
-def calcjob_remotecat(calcjob, path):
+def calcjob_remotecat(calcjob, path, monitor):
     """Show the contents of a file in the remote working directory.
 
     The file to show can be specified using the PATH argument. If PATH is not specified, the default output file path
@@ -135,8 +136,25 @@ def calcjob_remotecat(calcjob, path):
     import sys
     import tempfile
 
+    from aiida.common.exceptions import NotExistent
+
     remote_folder, path = get_remote_and_path(calcjob, path)
 
+    if monitor:
+        try:
+            transport = calcjob.get_transport()
+        except NotExistent as exception:
+            echo.echo_critical(str(exception))
+
+        remote_workdir = calcjob.get_remote_workdir()
+
+        if not remote_workdir:
+            echo.echo_critical('no remote work directory for this calcjob, maybe it has not yet started running?')
+        cmds = f"-c 'tail -f {path}'"
+        command = transport.gotocomputer_command(remote_workdir, cmds)
+        os.system(command)
+        return
+
     with tempfile.NamedTemporaryFile() as tmp_path:
         try:
             remote_folder.getfile(path, tmp_path.name)
diff --git a/aiida/transports/plugins/local.py b/aiida/transports/plugins/local.py
index c41f4f276..cc56a76a2 100644
--- a/aiida/transports/plugins/local.py
+++ b/aiida/transports/plugins/local.py
@@ -817,7 +817,7 @@ class LocalTransport(Transport):
 
         return retval, output_text, stderr_text
 
-    def gotocomputer_command(self, remotedir):
+    def gotocomputer_command(self, remotedir, extra_args=None):
         """
         Return a string to be run using os.system in order to connect
         via the transport to the remote directory.
@@ -829,7 +829,7 @@ class LocalTransport(Transport):
 
         :param str remotedir: the full path of the remote directory
         """
-        connect_string = self._gotocomputer_string(remotedir)
+        connect_string = self._gotocomputer_string(remotedir, extra_args)
         cmd = f'bash -c {connect_string}'
         return cmd
 
diff --git a/aiida/transports/plugins/ssh.py b/aiida/transports/plugins/ssh.py
index f47c9ef19..4849ce766 100644
--- a/aiida/transports/plugins/ssh.py
+++ b/aiida/transports/plugins/ssh.py
@@ -1482,7 +1482,7 @@ class SshTransport(Transport):  # pylint: disable=too-many-public-methods
 
         return (retval, b''.join(stdout_bytes), b''.join(stderr_bytes))
 
-    def gotocomputer_command(self, remotedir):
+    def gotocomputer_command(self, remotedir, extra_args=None):
         """
         Specific gotocomputer string to connect to a given remote computer via
         ssh and directly go to the calculation folder.
@@ -1505,7 +1505,7 @@ class SshTransport(Transport):  # pylint: disable=too-many-public-methods
 
         further_params_str = ' '.join(further_params)
 
-        connect_string = self._gotocomputer_string(remotedir)
+        connect_string = self._gotocomputer_string(remotedir, extra_args)
         cmd = f'ssh -t {self._machine} {further_params_str} {connect_string}'
         return cmd
 
diff --git a/aiida/transports/transport.py b/aiida/transports/transport.py
index 1f3a01c75..0031e5770 100644
--- a/aiida/transports/transport.py
+++ b/aiida/transports/transport.py
@@ -688,7 +688,7 @@ class Transport(abc.ABC):
         """
 
     @abc.abstractmethod
-    def gotocomputer_command(self, remotedir):
+    def gotocomputer_command(self, remotedir, extra_args=None):
         """
         Return a string to be run using os.system in order to connect
         via the transport to the remote directory.
@@ -700,6 +700,7 @@ class Transport(abc.ABC):
         * A reasonable error message is produced if the folder does not exist
 
         :param str remotedir: the full path of the remote directory
+        :param str extra_args: optional extra arguments to be passed to the shell
         """
 
     @abc.abstractmethod
@@ -815,13 +816,16 @@ class Transport(abc.ABC):
     def has_magic(self, string):
         return self._MAGIC_CHECK.search(string) is not None
 
-    def _gotocomputer_string(self, remotedir):
+    def _gotocomputer_string(self, remotedir, extra_args=None):
         """command executed when goto computer."""
         connect_string = (
             """ "if [ -d {escaped_remotedir} ] ;"""
-            """ then cd {escaped_remotedir} ; {bash_command} ; else echo '  ** The directory' ; """
+            """ then cd {escaped_remotedir} ; {bash_command} {extra_args}; else echo '  ** The directory' ; """
             """echo '  ** {remotedir}' ; echo '  ** seems to have been deleted, I logout...' ; fi" """.format(
-                bash_command=self._bash_command_str, escaped_remotedir="'{}'".format(remotedir), remotedir=remotedir
+                bash_command=self._bash_command_str,
+                extra_args=extra_args or '',
+                escaped_remotedir="'{}'".format(remotedir),
+                remotedir=remotedir
             )
         )
 
diff --git a/tests/cmdline/commands/test_calcjob.py b/tests/cmdline/commands/test_calcjob.py
index a763b1275..801761bf6 100644
--- a/tests/cmdline/commands/test_calcjob.py
+++ b/tests/cmdline/commands/test_calcjob.py
@@ -334,9 +334,11 @@ class TestVerdiCalculation:
         assert len(get_result_lines(result)) == 1
         assert get_result_lines(result)[0] == '5'
 
-    def test_calcjob_remotecat(self):
+    def test_calcjob_remotecat(self, monkeypatch):
         """Test the remotecat command that prints the remote file for a given calcjob"""
         # Specifying no filtering options and no explicit calcjobs should exit with non-zero status
+        import os
+
         options = []
         result = self.cli_runner.invoke(command.calcjob_remotecat, options)
         assert result.exception is not None, result.output
@@ -357,3 +359,10 @@ class TestVerdiCalculation:
         options = [str(self.result_job.uuid), 'fileA.txt']
         result = self.cli_runner.invoke(command.calcjob_remotecat, options)
         assert result.stdout == 'test stringA'
+
+        # To test the ``--monitor`` option, mock the ``os.system`` command and simply print the command it receives.
+        with monkeypatch.context() as ctx:
+            ctx.setattr(os, 'system', lambda x: print(x))  # pylint: disable=unnecessary-lambda
+            options = ['--monitor', str(self.result_job.uuid), 'fileA.txt']
+            result = self.cli_runner.invoke(command.calcjob_remotecat, options)
+            assert "bash -l  -c 'tail -f fileA.txt'" in result.stdout
diff --git a/tests/transports/test_local.py b/tests/transports/test_local.py
index fa88e0046..102f0323d 100644
--- a/tests/transports/test_local.py
+++ b/tests/transports/test_local.py
@@ -61,3 +61,12 @@ def test_gotocomputer():
             """echo '  ** /remote_dir/' ; echo '  ** seems to have been deleted, I logout...' ; fi" """
         )
         assert cmd_str == expected_str
+
+        cmd_str = transport.gotocomputer_command('/remote_dir/', "-c 'echo Hello World'")
+
+        expected_str = (
+            """bash -c  "if [ -d '/remote_dir/' ] ;"""
+            """ then cd '/remote_dir/' ; bash -l  -c 'echo Hello World'; else echo '  ** The directory' ; """
+            """echo '  ** /remote_dir/' ; echo '  ** seems to have been deleted, I logout...' ; fi" """
+        )
+        assert cmd_str == expected_str
diff --git a/tests/transports/test_ssh.py b/tests/transports/test_ssh.py
index 8b2043f87..958748723 100644
--- a/tests/transports/test_ssh.py
+++ b/tests/transports/test_ssh.py
@@ -124,6 +124,15 @@ def test_gotocomputer():
         )
         assert cmd_str == expected_str
 
+        cmd_str = transport.gotocomputer_command('/remote_dir/', '-c "echo Hello World"')
+
+        expected_str = (
+            """ssh -t localhost -o ProxyCommand='ssh -W localhost:22 localhost'  "if [ -d '/remote_dir/' ] ;"""
+            """ then cd '/remote_dir/' ; bash  -c "echo Hello World"; else echo '  ** The directory' ; """
+            """echo '  ** /remote_dir/' ; echo '  ** seems to have been deleted, I logout...' ; fi" """
+        )
+        assert cmd_str == expected_str
+
 
 def test_gotocomputer_proxyjump():
     """Test gotocomputer"""
-- 
2.25.1

zhubonan · 2022-12-27T09:59:59Z

Thanks for the patch 😄 . I think I will just delete my branch since the PR is now merged.

zhubonan requested a review from ltalirz April 17, 2021 15:04

ramirezfranciscof self-assigned this Apr 28, 2021

zhubonan force-pushed the pr-add-remotecat-command branch from 559c147 to 9770b11 Compare June 14, 2021 19:52

chrisjsewell suggested changes Jun 16, 2021

View reviewed changes

sphuber force-pushed the pr-add-remotecat-command branch 2 times, most recently from 412964f to 0132a8b Compare December 13, 2022 16:38

sphuber force-pushed the pr-add-remotecat-command branch 2 times, most recently from e6e3f76 to 294ed36 Compare December 14, 2022 13:57

zhubonan and others added 2 commits December 14, 2022 16:02

Added a remotecat command for printing remote file

bfba8e9

This can be usedful for monitoring the calcjobs while they are running.

Various minor improvements

4fbaea2

Also adds a test for the `--monitor` option.

sphuber force-pushed the pr-add-remotecat-command branch from 294ed36 to 6bcc260 Compare December 14, 2022 15:06

Remove the --monitor option

46bb011

sphuber force-pushed the pr-add-remotecat-command branch from 6bcc260 to 46bb011 Compare December 14, 2022 15:07

sphuber self-requested a review December 14, 2022 15:27

sphuber approved these changes Dec 14, 2022

View reviewed changes

sphuber merged commit 7df4f60 into aiidateam:main Dec 14, 2022

zhubonan deleted the pr-add-remotecat-command branch December 27, 2022 10:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a `remotecat` command for printing remote file of running calcjobs #4861

Added a `remotecat` command for printing remote file of running calcjobs #4861

zhubonan commented Apr 17, 2021

ramirezfranciscof commented Apr 19, 2021 •

edited

Loading

zhubonan commented Apr 19, 2021

mbercx commented Apr 19, 2021

ltalirz commented Apr 19, 2021 •

edited

Loading

ramirezfranciscof commented Apr 19, 2021

sphuber commented Apr 19, 2021

zhubonan commented Apr 19, 2021

codecov bot commented Apr 19, 2021 •

edited

Loading

zhubonan commented Jun 15, 2021

chrisjsewell commented Jun 16, 2021 •

edited

Loading

chrisjsewell left a comment

zhubonan commented Jun 22, 2021 •

edited

Loading

sphuber commented Dec 13, 2022

zhubonan commented Dec 13, 2022

sphuber commented Dec 14, 2022

sphuber left a comment

sphuber commented Dec 14, 2022

zhubonan commented Dec 27, 2022

Added a remotecat command for printing remote file of running calcjobs #4861

Added a remotecat command for printing remote file of running calcjobs #4861

Conversation

zhubonan commented Apr 17, 2021

ramirezfranciscof commented Apr 19, 2021 • edited Loading

zhubonan commented Apr 19, 2021

mbercx commented Apr 19, 2021

ltalirz commented Apr 19, 2021 • edited Loading

ramirezfranciscof commented Apr 19, 2021

sphuber commented Apr 19, 2021

zhubonan commented Apr 19, 2021

codecov bot commented Apr 19, 2021 • edited Loading

Codecov Report

zhubonan commented Jun 15, 2021

chrisjsewell commented Jun 16, 2021 • edited Loading

chrisjsewell left a comment

Choose a reason for hiding this comment

zhubonan commented Jun 22, 2021 • edited Loading

sphuber commented Dec 13, 2022

zhubonan commented Dec 13, 2022

sphuber commented Dec 14, 2022

sphuber left a comment

Choose a reason for hiding this comment

sphuber commented Dec 14, 2022

zhubonan commented Dec 27, 2022

Added a `remotecat` command for printing remote file of running calcjobs #4861

Added a `remotecat` command for printing remote file of running calcjobs #4861

ramirezfranciscof commented Apr 19, 2021 •

edited

Loading

ltalirz commented Apr 19, 2021 •

edited

Loading

codecov bot commented Apr 19, 2021 •

edited

Loading

chrisjsewell commented Jun 16, 2021 •

edited

Loading

zhubonan commented Jun 22, 2021 •

edited

Loading