Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panda's read_csv gives an error: Type of "read_csv" is partially unknown #230

Closed
ldorigo opened this issue Aug 12, 2020 · 12 comments
Closed
Labels
waiting for user response Requires more information from user

Comments

@ldorigo
Copy link

ldorigo commented Aug 12, 2020

Environment data

  • Language Server version: 2020.8.0
  • OS and version: Windows 10
  • Python version (& distribution if applicable, e.g. Anaconda): Python 3.8.3 (anaconda)

Expected behaviour

There shouldn't be an error whenever read_csv is used?

Actual behaviour

Whenever I use pd.read_csv(), pylance shows the following error:

Type of "read_csv" is partially unknown
  Type of "read_csv" is "Overload[(reader: IO[Unknown], sep: str = ..., delimiter: str | None = ..., header: int | List[int] | Literal['infer'] = ..., names: Sequence[str] | None = ..., index_col: int | str | Sequence[Unknown] | Literal[False] | None = ..., usecols: int | str | Sequence[Unknown] | None = ..., squeeze: bool = ..., prefix: str | None = ..., mangle_dupe_cols: bool = ..., dtype: str | Dict[str, Any] | None = ..., engine: Literal['c', 'python'] | None = ..., converters: Dict[int | str, (*args, **kwargs) -> Unknown] | None = ..., true_values: List[str | bytes | date | datetime | timedelta | bool | int | float | complex] | None = ..., false_values: List[str | bytes | date | datetime | timedelta | bool | int | float | complex] | None = ..., skipinitialspace: bool = ..., skiprows: Sequence[Unknown] | int | (*args, **kwargs) -> Unknown | None = ..., skipfooter: int = ..., nrows: int | None = ..., na_values: Any | None = ..., keep_default_na: bool = ..., na_filter: bool = ..., verbose: bool = ..., skip_blank_lines: bool = ..., parse_dates: bool = ..., infer_datetime_format: bool = ..., keep_date_col: bool = ..., date_parser: (*args, **kwargs) -> Unknown | None = ..., dayfirst: bool = ..., cache_dates: bool = ..., iterator: bool = ..., chunksize: int | None = ..., compression: Literal['infer', 'gzip', 'bz2', 'zip', 'xz'] | None = ..., thousands: str | None = ..., decimal: str | None = ..., lineterminator: str | None = ..., quotechar: str = ..., quoting: int = ..., doublequote: bool = ..., escapechar: str | None = ..., comment: str | None = ..., encoding: str | None = ..., dialect: str | None = ..., error_bad_lines: bool = ..., warn_bad_lines: bool = ..., delim_whitespace: bool = ..., low_memory: bool = ..., memory_map: bool = ..., float_precision: str | None = ...) -> TextParser, (filepath: str | Path, sep: str = ..., delimiter: str | None = ..., header: int | List[int] | Literal['infer'] = ..., names: Sequence[str] | None = ..., index_col: int | str | Sequence[Unknown] | Literal[False] | None = ..., usecols: int | str | Sequence[Unknown] | None = ..., squeeze: bool = ..., prefix: str | None = ..., mangle_dupe_cols: bool = ..., dtype: str | Dict[str, Any] | None = ..., engine: Literal['c', 'python'] | None = ..., converters: Dict[int | str, (*args, **kwargs) -> Unknown] | None = ..., true_values: List[str | bytes | date | datetime | timedelta | bool | int | float | complex] | None = ..., false_values: List[str | bytes | date | datetime | timedelta | bool | int | float | complex] | None = ..., skipinitialspace: bool = ..., skiprows: Sequence[Unknown] | int | (*args, **kwargs) -> Unknown | None = ..., skipfooter: int = ..., nrows: int | None = ..., na_values: Any | None = ..., keep_default_na: bool = ..., na_filter: bool = ..., verbose: bool = ..., skip_blank_lines: bool = ..., parse_dates: bool = ..., infer_datetime_format: bool = ..., keep_date_col: bool = ..., date_parser: (*args, **kwargs) -> Unknown | None = ..., dayfirst: bool = ..., cache_dates: bool = ..., iterator: bool = ..., chunksize: int | None = ..., compression: Literal['infer', 'gzip', 'bz2', 'zip', 'xz'] | None = ..., thousands: str | None = ..., decimal: str | None = ..., lineterminator: str | None = ..., quotechar: str = ..., quoting: int = ..., doublequote: bool = ..., escapechar: str | None = ..., comment: str | None = ..., encoding: str | None = ..., dialect: str | None = ..., error_bad_lines: bool = ..., warn_bad_lines: bool = ..., delim_whitespace: bool = ..., low_memory: bool = ..., memory_map: bool = ..., float_precision: str | None = ...) -> DataFrame]"Pylance (reportUnknownMemberType)

Logs

Don't think logs are necessary, if so I will be happy to add them.

Code Snippet / Additional information

import pandas as pd

test_df = pd.read_csv("file.csv")
@ldorigo
Copy link
Author

ldorigo commented Aug 12, 2020

Experimenting a bit more, this happens other panda's function:

.isin():

Type of "isin" is partially unknown
  Type of "isin" is "(values: Iterable[Unknown] | Series[_DType] | Dict[Unknown, Unknown]) -> Series[_bool]"

.dropna():

Type of "dropna" is partially unknown
  Type of "dropna" is "Overload[(axis: str | int = ..., how: Literal['any', 'all'] = ..., thresh: int | None = ..., subset: List[Unknown] | None = ..., *, inplace: Literal[True]) -> None, (axis: str | int = ..., how: Literal['any', 'all'] = ..., thresh: int | None = ..., subset: List[Unknown] | None = ..., inplace: Literal[False] | None = ...) -> DataFrame]"

@erictraut
Copy link
Contributor

The "partially unknown" error should appear only if you enable "strict" type checking. In this mode, Pylance will report any cases where types are unknown or partially unknown. In this case, the pandas type stub has incomplete information. It is not providing type arguments in some cases (e.g. it uses "Iterable" rather than "Iterable[Type]").

The pandas type stubs are still under development and have known holes like this. Until they are complete, you will not be able to use them with "strict" type checking.

@jakebailey
Copy link
Member

These stubs will be much, much improved in the next release as well.

@jakebailey
Copy link
Member

Though there is one thing here that I think might be unintended; we're showing an error related to the definition of a stub in a user file. If the stub is "wrong", I'm not sure that this error is actionable from the use side of the call.

@jakebailey jakebailey added bug Something isn't working needs investigation Could be an issue - needs investigation labels Aug 12, 2020
@ldorigo
Copy link
Author

ldorigo commented Aug 13, 2020

Yes, I enabled strict mode. I suspected it was an issue with the stub files, but as @jakebailey says, this shouldn't show as an error since the problem is not under the user's control. Even in strict mode, it should probably just show a warning stating that there is a problem with the stubs (and that it isn't something the user can do something about).

@erictraut
Copy link
Contributor

The whole purpose of strict mode is to inform the user that there is a "hole" in type checking. If you want to adjust the diagnostic severities for specific rules (e.g. change an error into a warning), you can configure it as such.

@ldorigo
Copy link
Author

ldorigo commented Aug 14, 2020

Fair enough, I get your point. Feel free to close this. Although I still think it would be nice to make it clear to the user that the problem stems from the stubs, and not from his own code.

@jakebailey
Copy link
Member

That can turn out to be difficult, unfortunately. For example, if a library has declared a generic, and the user code misuses it and forgets to specify one of the types in their annotations, then from the type checker's point of view it's not much different than the above case because there's something in the current code that's unknown. I'm not entirely certain it's possible to know for certain who is "at fault" when the unknown happens...

@jakebailey
Copy link
Member

@ldorigo Is this still the case in recent releases? Our pandas stubs have been improved a number of times since this issue was created.

@jakebailey jakebailey added waiting for user response Requires more information from user and removed bug Something isn't working needs investigation Could be an issue - needs investigation labels Oct 2, 2020
@savannahostrowski
Copy link
Contributor

This issue has been waiting for a follow up for 30 days. Because we haven't heard back, we'll be closing this ticket. Feel free to reach out if this is still a problem!

@Donnerstagnacht
Copy link

Donnerstagnacht commented Aug 8, 2024

Am I the only one who still has this issue? Is my type set up not working or will this error still occure in strict mode?

What is the workarround if I would like to use pandas in a strict environment?

@debonte
Copy link
Contributor

debonte commented Aug 8, 2024

@Donnerstagnacht, below is the signature of read_csv as Pylance sees it. Note the Unknown types, which from a cursory check seem to be primarily from generics without type args. Your question is really a question for the pandas-stubs team. There's already an issue there that touches on this. Note Dr-Irv's comment there that they only test and support "basic" mode in Pyright.

(function) def read_csv(
    filepath_or_buffer: FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str],
    *,
    sep: str | None = ...,
    delimiter: str | None = ...,
    header: int | Sequence[int] | Literal['infer'] | None = ...,
    names: ListLikeHashable[Unknown] | None = ...,
    index_col: int | str | Sequence[str | int] | Literal[False] | None = ...,
    usecols: UsecolsArgType[Unknown] = ...,
    dtype: DtypeArg | defaultdict[Unknown, Unknown] | None = ...,
    engine: CSVEngine | None = ...,
    converters: Mapping[int | str, (str) -> Any] | Mapping[int, (str) -> Any] | Mapping[str, (str) -> Any] | None = ...,
    true_values: list[str] = ...,
    false_values: list[str] = ...,
    skipinitialspace: bool = ...,
    skiprows: int | Sequence[int] | ((int) -> bool) = ...,
    skipfooter: int = ...,
    nrows: int | None = ...,
    na_values: Sequence[str] | Mapping[str, Sequence[str]] = ...,
    keep_default_na: bool = ...,
    na_filter: bool = ...,
    verbose: bool = ...,
    skip_blank_lines: bool = ...,
    parse_dates: bool | list[int] | list[str] | Sequence[Sequence[int]] | Mapping[str, Sequence[int | str]] = ...,
    infer_datetime_format: bool = ...,
    keep_date_col: bool = ...,
    date_format: dict[Hashable, str] | str | None = ...,
    dayfirst: bool = ...,
    cache_dates: bool = ...,
    iterator: Literal[False] = ...,
    chunksize: None = ...,
    compression: CompressionOptions = ...,
    thousands: str | None = ...,
    decimal: str = ...,
    lineterminator: str | None = ...,
    quotechar: str = ...,
    quoting: CSVQuoting = ...,
    doublequote: bool = ...,
    escapechar: str | None = ...,
    comment: str | None = ...,
    encoding: str | None = ...,
    encoding_errors: str | None = ...,
    dialect: str | Dialect = ...,
    on_bad_lines: ((list[str]) -> (list[str] | None)) | Literal['error', 'warn', 'skip'] = ...,
    delim_whitespace: bool = ...,
    low_memory: bool = ...,
    memory_map: bool = ...,
    float_precision: Literal['high', 'legacy', 'round_trip'] | None = ...,
    storage_options: StorageOptions = ...,
    dtype_backend: DtypeBackend | Literal[_NoDefault.no_default] = ...
) -> DataFrame

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for user response Requires more information from user
Projects
None yet
Development

No branches or pull requests

6 participants