-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Added tests for the function load_pathlist_from_file
#1469
Conversation
Codecov Report
@@ Coverage Diff @@
## latest #1469 +/- ##
==========================================
+ Coverage 90.20% 95.20% +4.99%
==========================================
Files 126 99 -27
Lines 21194 17553 -3641
Branches 1595 1600 +5
==========================================
- Hits 19118 16711 -2407
+ Misses 1844 609 -1235
- Partials 232 233 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I wrote 2 tests for Badly Formatted Files and 1 test for Duplicate Files. I came across some interesting errors that need to be worked on. |
hi @keyabarve could you summarize the errors here in the comments, and (if you're up to it) propose some solutions? thanks! |
@ctb @luizirber
|
tests/test_sourmash.py
Outdated
from sourmash.sourmash_args import load_pathlist_from_file | ||
with pytest.raises(ValueError) as e: | ||
load_pathlist_from_file("") | ||
assert "cannot open file ''" in e.message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note, you should de-indent here, because the assert
statement is never reached - this is because load_pathlist_from_file
raises a ValueError, which ends the block and is caught by the with
statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which lines should I de-indent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll ask a question instead ;).
try putting an assert 0
after the load_pathlist_from_file
call - does the test fail, as you'd expect it would?
also see:
- asserting with the
assert
statement with pytest.raises
documentation- [context managers](https://www.python.org/dev/peps/pep-0343/#id36
python exceptions and try/except blocks may also be useful.
And note that you can put print
statements in tests and the output will be displayed only when the test fails, so they're useful for debugging tests.
thanks! These are good checks, and I agree To fix them, you'd want to have For example, to fix the Is this something you want to pursue? |
Yes, I would like to continue working on this. I will try fixing it. |
excellent! I think a good high level summary of what to do overall is "take any of these conditions or errors and turn them into statements that raise |
@ctb @luizirber
|
On Mon, Apr 26, 2021 at 05:17:55PM -0700, Keya Barve wrote:
- What exactly does the argument "filename" passed to the function `load_pathlist_from_file` contain? Is "filename" a single file, which contains a list of files?
yes.
- What does the variable "file_list" that is returned by this function contain? Is it an array containing the list of files from "filename"?
that is defined by the function, right? there's no mystery here, you have
the full source code :).
(what is the answer, based on the current code?)
- When we are checking for all the test cases (file doesn't exist, empty file, badly formatted file, and duplicate file), are we checking all these for each of the files inside "filename"?
I believe that most of those conditions are handled by existing test code, so
you don't need to check them explicitly.
The only exception is duplicate file. I think perhaps we should collapse
duplicate entries in the file by returning a set rather than a list.
- So, while writing the different if statements in this function, should I loop through "file_list" to check each of the files?
Nope.
You might be interested in reading up on "duck typing" in Python, BTW;
as it is, the load_pathlist function takes any stringlike object and
returns an iterable, but those are the only operational constraints at
the moment. And that's intentional; there's no reason to be more specific
about what this function takes and returns.
|
Alright, thanks! |
Related discussion: #1410 |
@ctb @luizirber
Could you provide some tips on how I can fix this? |
On Fri, Apr 30, 2021 at 11:24:24AM -0700, Keya Barve wrote:
@ctb @luizirber
Whenever I test any of the test cases, it gives me a type error, saying ```TypeError: stat: path should be string, bytes, os.PathLike or integer, not list```
This error is coming from the function:
```def exists(path):
"""Test whether a path exists. Returns False for broken symbolic links"""
try:
> os.stat(path)
E TypeError: stat: path should be string, bytes, os.PathLike or integer, not list```
what is path?
do:
print((path,), type(path))
|
Where should I write this print statement? Should I write it in |
On Fri, Apr 30, 2021 at 11:28:34AM -0700, Keya Barve wrote:
Where should I write this print statement? Should I write it in `load_pathlist_from_file`? There is no path variable in this function, so I'm not sure if it will print anything.
either where you call load_pathlist_from_file, or in load_pathlist from file.
please remember you control all the code here, so you can print out variables
where you want :)
|
@ctb @luizirber |
what about collapsing We probably don't need to emit a warning in the case of duplicate files, because if you passed duplicate files into most sourmash commands, sourmash would invisibly deduplicate them anyway in search/gather/sketch). |
What do you mean by collapsing Also, should I remove the test case for the duplicate files entirely? |
On Fri, Apr 30, 2021 at 12:58:09PM -0700, Keya Barve wrote:
> > @ctb @luizirber
> > I think the code works fine for all the test cases except for the duplicate files. Since I have created a loop, should I remove the `test_load_pathlist_from_file_badly_formatted_2` test case?
>
> what about collapsing `file_list` automatically, using an [ordered dictionary](https://docs.python.org/3/library/collections.html#collections.OrderedDict) so that the files remain in the same order but get collapsed? You could also do that within your for loop using a `set` underneath to see if we've already seen a file.
>
> We probably don't need to emit a warning in the case of duplicate files, because if you passed duplicate files into most sourmash commands, sourmash would invisibly deduplicate them anyway in search/gather/sketch).
What do you mean by collapsing `file_list`? Should I be doing this for the duplicate files?
collapsing => ignoring duplicates.
Also, should I remove the test case for the duplicate files entirely?
I would only remove tests if they are entirely redundant or serve
no purpose whatsoever.
|
src/sourmash/sourmash_args.py
Outdated
|
||
if not os.path.exists(file_list[0]): | ||
raise ValueError("first element of list-of-files does not exist") | ||
if len(file_list) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note, this can be abbreviated more Pythonically as:
if not file list:
...
(in Python, empty lists evaluate to False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll fix that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revisit this comment, thanks - https://github.com/dib-lab/sourmash/pull/1469/files#r624171173
src/sourmash/sourmash_args.py
Outdated
if not os.path.exists(file_list[i]): | ||
cnt = cnt + 1 | ||
if cnt > 0: | ||
raise ValueError("list-of-files contains a badly formatted file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this error message right? You're checking if the file exists; if I were a user I would not know that "badly formatted" means "does not exist"
Also, suggest using "pathlist" rather than "list of files".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you suggest a way to check for badly formatted files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably fine to just say, "file '' does not exist".
Could you suggest a way to check for badly formatted files?
I'm afraid I don't really have any good suggestions off the top of my head!
|
@ctb @luizirber |
hi @keyabarve looks like there's still a test failing 😭 . Could you take a look and give it a stab, or ask some questions? |
@ctb Sure! I'll take a look at it! Which test is failing exactly? |
On Mon, May 17, 2021 at 06:25:53PM -0700, Keya Barve wrote:
@ctb Sure! I'll take a look at it! Which test is failing exactly?
you can dig around in the "some checks were not successful" to find out,
or run all the tests yourself, locally.
|
hi @keyabarve you could usefully add a test to catch this situation: #1537 |
sorry, a fix AND a test :) |
@ctb My latest code already takes care of the issue and I already have a test for it! :) |
@@ -199,7 +252,6 @@ def test_do_compare_quiet(c): | |||
testdata1 = utils.get_test_data('short.fa') | |||
testdata2 = utils.get_test_data('short2.fa') | |||
c.run_sourmash('compute', '-k', '31', testdata1, testdata2) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note - please try to avoid changes in bits of the code that aren't relevant to the issues tackled by this PR - it adds a bit of confusion to the review :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it's probably something I did before and forgot to revert! Should I fix it and push again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need, just for future consideration!
Is this ready to be merged? Or is there anything else that needs to be fixed? |
I think this error message is misleading --
and should be replaced with the same text as below,
|
Is this in the |
@ctb This is the error I'm getting if I change the statement:
|
I'm not sure how to comment, other than to say that this logic looks inherently problematic to me :)
since the error message is saying "this is a badly formatted file" but the if statement is checking to see if that file exists... I think the useful thing to say is "this file doesn't exist". |
Hmm I see. Should I do this then: |
The problem that I'm facing is that it's giving me the error I mentioned above. I'm confused about how to fix that. |
The assert statement in my test case is this: |
On Fri, May 21, 2021 at 02:29:36PM -0700, Keya Barve wrote:
The assert statement in my test case is this: `assert "file '' inside the pathlist does not exist" in str(e.value)`. If I change that to `assert "file '{'a':1}' inside the pathlist does not exist"` in str(e.value), then the tests pass. So should I do that then?
Sure, give it a try.
|
Please review. @ctb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I'll update from latest and then one of us can merge when all the tests pass!
Nice work - I know it took a while ;)
load_pathlist_from_file
load_pathlist_from_file
Alright! Thanks! By merge, you mean 'Squash and merge' right? |
Ready for merge. @ctb |
🎉 |
Fixes #1430