Panda's read_csv gives an error: Type of "read_csv" is partially unknown #230

ldorigo · 2020-08-12T12:36:14Z

Environment data

Language Server version: 2020.8.0
OS and version: Windows 10
Python version (& distribution if applicable, e.g. Anaconda): Python 3.8.3 (anaconda)

Expected behaviour

There shouldn't be an error whenever read_csv is used?

Actual behaviour

Whenever I use pd.read_csv(), pylance shows the following error:

Type of "read_csv" is partially unknown
  Type of "read_csv" is "Overload[(reader: IO[Unknown], sep: str = ..., delimiter: str | None = ..., header: int | List[int] | Literal['infer'] = ..., names: Sequence[str] | None = ..., index_col: int | str | Sequence[Unknown] | Literal[False] | None = ..., usecols: int | str | Sequence[Unknown] | None = ..., squeeze: bool = ..., prefix: str | None = ..., mangle_dupe_cols: bool = ..., dtype: str | Dict[str, Any] | None = ..., engine: Literal['c', 'python'] | None = ..., converters: Dict[int | str, (*args, **kwargs) -> Unknown] | None = ..., true_values: List[str | bytes | date | datetime | timedelta | bool | int | float | complex] | None = ..., false_values: List[str | bytes | date | datetime | timedelta | bool | int | float | complex] | None = ..., skipinitialspace: bool = ..., skiprows: Sequence[Unknown] | int | (*args, **kwargs) -> Unknown | None = ..., skipfooter: int = ..., nrows: int | None = ..., na_values: Any | None = ..., keep_default_na: bool = ..., na_filter: bool = ..., verbose: bool = ..., skip_blank_lines: bool = ..., parse_dates: bool = ..., infer_datetime_format: bool = ..., keep_date_col: bool = ..., date_parser: (*args, **kwargs) -> Unknown | None = ..., dayfirst: bool = ..., cache_dates: bool = ..., iterator: bool = ..., chunksize: int | None = ..., compression: Literal['infer', 'gzip', 'bz2', 'zip', 'xz'] | None = ..., thousands: str | None = ..., decimal: str | None = ..., lineterminator: str | None = ..., quotechar: str = ..., quoting: int = ..., doublequote: bool = ..., escapechar: str | None = ..., comment: str | None = ..., encoding: str | None = ..., dialect: str | None = ..., error_bad_lines: bool = ..., warn_bad_lines: bool = ..., delim_whitespace: bool = ..., low_memory: bool = ..., memory_map: bool = ..., float_precision: str | None = ...) -> TextParser, (filepath: str | Path, sep: str = ..., delimiter: str | None = ..., header: int | List[int] | Literal['infer'] = ..., names: Sequence[str] | None = ..., index_col: int | str | Sequence[Unknown] | Literal[False] | None = ..., usecols: int | str | Sequence[Unknown] | None = ..., squeeze: bool = ..., prefix: str | None = ..., mangle_dupe_cols: bool = ..., dtype: str | Dict[str, Any] | None = ..., engine: Literal['c', 'python'] | None = ..., converters: Dict[int | str, (*args, **kwargs) -> Unknown] | None = ..., true_values: List[str | bytes | date | datetime | timedelta | bool | int | float | complex] | None = ..., false_values: List[str | bytes | date | datetime | timedelta | bool | int | float | complex] | None = ..., skipinitialspace: bool = ..., skiprows: Sequence[Unknown] | int | (*args, **kwargs) -> Unknown | None = ..., skipfooter: int = ..., nrows: int | None = ..., na_values: Any | None = ..., keep_default_na: bool = ..., na_filter: bool = ..., verbose: bool = ..., skip_blank_lines: bool = ..., parse_dates: bool = ..., infer_datetime_format: bool = ..., keep_date_col: bool = ..., date_parser: (*args, **kwargs) -> Unknown | None = ..., dayfirst: bool = ..., cache_dates: bool = ..., iterator: bool = ..., chunksize: int | None = ..., compression: Literal['infer', 'gzip', 'bz2', 'zip', 'xz'] | None = ..., thousands: str | None = ..., decimal: str | None = ..., lineterminator: str | None = ..., quotechar: str = ..., quoting: int = ..., doublequote: bool = ..., escapechar: str | None = ..., comment: str | None = ..., encoding: str | None = ..., dialect: str | None = ..., error_bad_lines: bool = ..., warn_bad_lines: bool = ..., delim_whitespace: bool = ..., low_memory: bool = ..., memory_map: bool = ..., float_precision: str | None = ...) -> DataFrame]"Pylance (reportUnknownMemberType)

Logs

Don't think logs are necessary, if so I will be happy to add them.

Code Snippet / Additional information

import pandas as pd

test_df = pd.read_csv("file.csv")

The text was updated successfully, but these errors were encountered:

ldorigo · 2020-08-12T12:41:03Z

Experimenting a bit more, this happens other panda's function:

.isin():

Type of "isin" is partially unknown
  Type of "isin" is "(values: Iterable[Unknown] | Series[_DType] | Dict[Unknown, Unknown]) -> Series[_bool]"

.dropna():

Type of "dropna" is partially unknown
  Type of "dropna" is "Overload[(axis: str | int = ..., how: Literal['any', 'all'] = ..., thresh: int | None = ..., subset: List[Unknown] | None = ..., *, inplace: Literal[True]) -> None, (axis: str | int = ..., how: Literal['any', 'all'] = ..., thresh: int | None = ..., subset: List[Unknown] | None = ..., inplace: Literal[False] | None = ...) -> DataFrame]"

erictraut · 2020-08-12T16:21:52Z

The "partially unknown" error should appear only if you enable "strict" type checking. In this mode, Pylance will report any cases where types are unknown or partially unknown. In this case, the pandas type stub has incomplete information. It is not providing type arguments in some cases (e.g. it uses "Iterable" rather than "Iterable[Type]").

The pandas type stubs are still under development and have known holes like this. Until they are complete, you will not be able to use them with "strict" type checking.

jakebailey · 2020-08-12T17:19:29Z

These stubs will be much, much improved in the next release as well.

jakebailey · 2020-08-12T17:58:57Z

Though there is one thing here that I think might be unintended; we're showing an error related to the definition of a stub in a user file. If the stub is "wrong", I'm not sure that this error is actionable from the use side of the call.

ldorigo · 2020-08-13T07:51:01Z

Yes, I enabled strict mode. I suspected it was an issue with the stub files, but as @jakebailey says, this shouldn't show as an error since the problem is not under the user's control. Even in strict mode, it should probably just show a warning stating that there is a problem with the stubs (and that it isn't something the user can do something about).

erictraut · 2020-08-13T15:06:51Z

The whole purpose of strict mode is to inform the user that there is a "hole" in type checking. If you want to adjust the diagnostic severities for specific rules (e.g. change an error into a warning), you can configure it as such.

ldorigo · 2020-08-14T06:39:21Z

Fair enough, I get your point. Feel free to close this. Although I still think it would be nice to make it clear to the user that the problem stems from the stubs, and not from his own code.

jakebailey · 2020-08-14T18:43:13Z

That can turn out to be difficult, unfortunately. For example, if a library has declared a generic, and the user code misuses it and forgets to specify one of the types in their annotations, then from the type checker's point of view it's not much different than the above case because there's something in the current code that's unknown. I'm not entirely certain it's possible to know for certain who is "at fault" when the unknown happens...

jakebailey · 2020-10-02T20:47:05Z

@ldorigo Is this still the case in recent releases? Our pandas stubs have been improved a number of times since this issue was created.

savannahostrowski · 2020-11-02T18:02:36Z

This issue has been waiting for a follow up for 30 days. Because we haven't heard back, we'll be closing this ticket. Feel free to reach out if this is still a problem!

Donnerstagnacht · 2024-08-08T13:08:36Z

Am I the only one who still has this issue? Is my type set up not working or will this error still occure in strict mode?

What is the workarround if I would like to use pandas in a strict environment?

debonte · 2024-08-08T16:08:40Z

@Donnerstagnacht, below is the signature of read_csv as Pylance sees it. Note the Unknown types, which from a cursory check seem to be primarily from generics without type args. Your question is really a question for the pandas-stubs team. There's already an issue there that touches on this. Note Dr-Irv's comment there that they only test and support "basic" mode in Pyright.

(function) def read_csv(
    filepath_or_buffer: FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str],
    *,
    sep: str | None = ...,
    delimiter: str | None = ...,
    header: int | Sequence[int] | Literal['infer'] | None = ...,
    names: ListLikeHashable[Unknown] | None = ...,
    index_col: int | str | Sequence[str | int] | Literal[False] | None = ...,
    usecols: UsecolsArgType[Unknown] = ...,
    dtype: DtypeArg | defaultdict[Unknown, Unknown] | None = ...,
    engine: CSVEngine | None = ...,
    converters: Mapping[int | str, (str) -> Any] | Mapping[int, (str) -> Any] | Mapping[str, (str) -> Any] | None = ...,
    true_values: list[str] = ...,
    false_values: list[str] = ...,
    skipinitialspace: bool = ...,
    skiprows: int | Sequence[int] | ((int) -> bool) = ...,
    skipfooter: int = ...,
    nrows: int | None = ...,
    na_values: Sequence[str] | Mapping[str, Sequence[str]] = ...,
    keep_default_na: bool = ...,
    na_filter: bool = ...,
    verbose: bool = ...,
    skip_blank_lines: bool = ...,
    parse_dates: bool | list[int] | list[str] | Sequence[Sequence[int]] | Mapping[str, Sequence[int | str]] = ...,
    infer_datetime_format: bool = ...,
    keep_date_col: bool = ...,
    date_format: dict[Hashable, str] | str | None = ...,
    dayfirst: bool = ...,
    cache_dates: bool = ...,
    iterator: Literal[False] = ...,
    chunksize: None = ...,
    compression: CompressionOptions = ...,
    thousands: str | None = ...,
    decimal: str = ...,
    lineterminator: str | None = ...,
    quotechar: str = ...,
    quoting: CSVQuoting = ...,
    doublequote: bool = ...,
    escapechar: str | None = ...,
    comment: str | None = ...,
    encoding: str | None = ...,
    encoding_errors: str | None = ...,
    dialect: str | Dialect = ...,
    on_bad_lines: ((list[str]) -> (list[str] | None)) | Literal['error', 'warn', 'skip'] = ...,
    delim_whitespace: bool = ...,
    low_memory: bool = ...,
    memory_map: bool = ...,
    float_precision: Literal['high', 'legacy', 'round_trip'] | None = ...,
    storage_options: StorageOptions = ...,
    dtype_backend: DtypeBackend | Literal[_NoDefault.no_default] = ...
) -> DataFrame

github-actions bot added the triage label Aug 12, 2020

jakebailey added bug Something isn't working needs investigation Could be an issue - needs investigation labels Aug 12, 2020

github-actions bot removed triage labels Aug 12, 2020

jakebailey added waiting for user response Requires more information from user and removed bug Something isn't working needs investigation Could be an issue - needs investigation labels Oct 2, 2020

savannahostrowski closed this as completed Nov 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panda's read_csv gives an error: Type of "read_csv" is partially unknown #230

Panda's read_csv gives an error: Type of "read_csv" is partially unknown #230

ldorigo commented Aug 12, 2020

ldorigo commented Aug 12, 2020

erictraut commented Aug 12, 2020

jakebailey commented Aug 12, 2020

jakebailey commented Aug 12, 2020

ldorigo commented Aug 13, 2020

erictraut commented Aug 13, 2020

ldorigo commented Aug 14, 2020

jakebailey commented Aug 14, 2020

jakebailey commented Oct 2, 2020

savannahostrowski commented Nov 2, 2020

Donnerstagnacht commented Aug 8, 2024 •

edited

Loading

debonte commented Aug 8, 2024

Panda's read_csv gives an error: Type of "read_csv" is partially unknown #230

Panda's read_csv gives an error: Type of "read_csv" is partially unknown #230

Comments

ldorigo commented Aug 12, 2020

Environment data

Expected behaviour

Actual behaviour

Logs

Code Snippet / Additional information

ldorigo commented Aug 12, 2020

erictraut commented Aug 12, 2020

jakebailey commented Aug 12, 2020

jakebailey commented Aug 12, 2020

ldorigo commented Aug 13, 2020

erictraut commented Aug 13, 2020

ldorigo commented Aug 14, 2020

jakebailey commented Aug 14, 2020

jakebailey commented Oct 2, 2020

savannahostrowski commented Nov 2, 2020

Donnerstagnacht commented Aug 8, 2024 • edited Loading

debonte commented Aug 8, 2024

Donnerstagnacht commented Aug 8, 2024 •

edited

Loading