Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reading large files over SCP #621

Merged
merged 6 commits into from
Jun 27, 2024
Merged

Fix reading large files over SCP #621

merged 6 commits into from
Jun 27, 2024

Conversation

Jakuje
Copy link
Contributor

@Jakuje Jakuje commented Jun 21, 2024

SUMMARY

The current SCP code does not handle large reads. Instead just allocates the large buffer fills up to 64k bytes and then writes the whole to the file (most likely the rest filled with junk).

The libssh returns the number of read bytes from ssh_scp_read() and for practical reasons does not allocate the whole file in memory. The reads are capped to 64k B.

This PR fixes the read by both not allocating the entire file size in memory and reading it by chunks (that much max libssh chunk atm) and writing that to the local file.

ISSUE TYPE
  • Bugfix Pull Request
ADDITIONAL INFORMATION

The attached test case is failing without the change, with errors like this:

_______________________________________________________________________ test_get_large _______________________________________________________________________
[gw0] linux -- Python 3.12.3 /home/jjelen/devel/pylibssh/.tox/python/bin/python

dst_path = PosixPath('/tmp/pytest-of-jjelen/pytest-35/popen-gw0/test_get_large0/dst-file.txt')
src_path_large = PosixPath('/tmp/pytest-of-jjelen/pytest-35/popen-gw0/test_get_large0/large.txt'), ssh_scp = <pylibsshext.scp.SCP object at 0x7f3f83cffac0>
large_payload = b'&PE~~,!=1@F7RaUccg#)q-ru^~f8Zv/-{~tFE-\\>v2_b}i@`N/]_%*~{oD`"F`*4Ns]HK^lewILvC1O[]gFC9(27%MuX$s&24et<dRJ$"& zf_XIqm7...**<J_;?uI47vm\'/A"xi5a!kQXoL=M}ne;$E[B=EpasVbRO&\\zy7t,GK yu;l*$s<#P:sJLi@od#:&:m?8]9KFN4@QAF8hD, ~T@WNP;)%+>)TV\'ODi9'

    def test_get_large(dst_path, src_path_large, ssh_scp, large_payload):
        """Check that SCP file download works also for large files."""
>       ssh_scp.get(str(src_path_large), str(dst_path))

dst_path   = PosixPath('/tmp/pytest-of-jjelen/pytest-35/popen-gw0/test_get_large0/dst-file.txt')
large_payload = (b'&PE~~,!=1@F7RaUccg#)q-ru^~f8Zv/-{~tFE-\\>v2_b}i@`N/]_%*~{oD`"F`*4Ns]HK^le'
[...]
 b"?8]9KFN4@QAF8hD, ~T@WNP;)%+>)TV'ODi9")
src_path_large = PosixPath('/tmp/pytest-of-jjelen/pytest-35/popen-gw0/test_get_large0/large.txt')
ssh_scp    = <pylibsshext.scp.SCP object at 0x7f3f83cffac0>


tests/unit/scp_test.py:111: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   raise LibsshSCPException("Unexpected request: %s" % self._get_ssh_error_str())
E   pylibsshext.errors.LibsshSCPException: Unexpected request: b'ssh_scp_pull_request called under invalid state'


src/pylibsshext/scp.pyx:148: LibsshSCPException

(lines do not match)

Additionally, there is one more test case, that I suspected will be problematic, but it turned out it works just ok, so leaving for the coverage.

@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Jun 21, 2024
Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/ansible-pylibssh-621
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@webknjaz
Copy link
Member

@Jakuje FYI the linting failures block this PR

src/pylibsshext/scp.pyx Outdated Show resolved Hide resolved
src/pylibsshext/scp.pyx Outdated Show resolved Hide resolved
src/pylibsshext/scp.pyx Outdated Show resolved Hide resolved
src/pylibsshext/scp.pyx Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
@Jakuje
Copy link
Contributor Author

Jakuje commented Jun 26, 2024

The suggestions accepted, the linting should pass now.

tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
src/pylibsshext/scp.pyx Outdated Show resolved Hide resolved
src/pylibsshext/scp.pyx Outdated Show resolved Hide resolved
src/pylibsshext/scp.pyx Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
@webknjaz webknjaz added the bug Something isn't working label Jun 26, 2024
tests/unit/scp_test.py Outdated Show resolved Hide resolved
tests/unit/scp_test.py Outdated Show resolved Hide resolved
@webknjaz
Copy link
Member

@Jakuje hey, so I added a few improvements on top and posted a few more comments inline. I've now stopped pushing to the PR branch so feel free to clean it up/rebase/make more changes.

@webknjaz
Copy link
Member

This needs conflict resolution now.

@Jakuje
Copy link
Contributor Author

Jakuje commented Jun 27, 2024

Resolved the conflict and dropped the check for too large replies.

Copy link
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@webknjaz webknjaz enabled auto-merge June 27, 2024 16:07
@webknjaz
Copy link
Member

Looks like one of the CI jobs got stuck with #557 so I restarted it: https://github.com/ansible/pylibssh/actions/runs/9699687288/job/26769948869?pr=621#step:15:152.

@webknjaz webknjaz merged commit 3888862 into ansible:devel Jun 27, 2024
161 of 165 checks passed
@webknjaz
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:chronographer:provided There is a change note present in this PR bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants