Return dictionaries instead of tuples #959

Peque · 2020-04-26T11:30:57Z

I found it a bit inconvenient to have fit_sdm_cec_sam() return a tuple. I think returning a dictionary with ['I_L_ref', 'I_o_ref', 'R_sh_ref', 'R_s', 'a_ref', 'Adjust'] keys would be much more convenient and safe.

The same goes for calcparams_cec(), which could return a dictionary with ['photocurrent', 'saturation_current', 'resistance_series', 'resistance_shunt', 'nNsVth'] keys instead.

I think that is more Pythonic (explicit is better than implicit and readability counts).

Also, more convenient API-wise:

cec_parameters = fit_sdm_cec_sam(...)
sd_parameters = calcparams_cec(..., **cec_parameters)
curve_info = singlediode(..., **sd_parameters)

The tests could be simplified as well. They currently look like:

I_L_ref, I_o_ref, R_sh_ref, R_s, a_ref, Adjust = ivtools.fit_sdm_cec_sam(...)
expected = pd.Series(get_cec_params_cansol_cs5p_220p['output'])
modeled = pd.Series(index=expected.index, data=np.nan)
modeled['a_ref'] = a_ref
modeled['I_L_ref'] = I_L_ref
modeled['I_o_ref'] = I_o_ref
modeled['R_sh_ref'] = R_sh_ref
modeled['R_s'] = R_s
modeled['Adjust'] = Adjust
assert np.allclose(modeled.values, expected.values, rtol=5e-2)

They could look like:

modeled = ivtools.fit_sdm_cec_sam(...)
expected = get_cec_params_cansol_cs5p_220p['output']
assert modeled == pytest.approx(expected, rel=5e-2)

The text was updated successfully, but these errors were encountered:

kandersolar · 2020-04-27T01:20:02Z

For reference, tuples vs dicts was discussed briefly (with no immediate resolution) in this thread: #784 (comment)

Peque · 2020-04-27T07:30:54Z

PD: Since what I am proposing is a breaking change... I also think that reducing the number of parameters in a function is good (i.e.: curve_info = singlediode(..., sd_parameters=sd_parameters)), where sd_parameters is a variable containing all the SDM parameters. Although this could also be applied with a tuple instead of a dict.

mikofski · 2020-04-27T21:26:57Z

My preference would be to use keyword arguments instead, eg: curve_info = singlediode(..., a, b, c, d) = singlediode(..., **sd_parameters) where sd_parameters = {'a': 12.3, 'b': 4.56, 'c': 0.789, 'd': False}. I believe there are a couple of reasons why this could be more Pythonic:

The keyword arguments are explicit. If an argument is misspelled or missing, the function will raise an exception
Whereas I believe a dictionary argument would be implicit. It requires documentation to tell the user what the required keys should be. This is potentially error prone, one could mistype a key or miss one.
Using keywords allows the user all options, pass the dictionary using splat, as keywords, or positional.

wholmgren · 2020-04-27T21:43:00Z

In my experience maintaining pvlib, it's been easier to deal with things that require order (positional args and returned tuples) than things that require labels (keyword args and returned dicts or DataFrames). I do use a lot of dicts, Series, and DataFrames in my work, so I certainly appreciate their general value. It just seems they're harder for a more general purpose library. As I mentioned in the linked issue, I'd really like to read a blog post about this kind of API design question in scientific computing.

It's also not clear to me that you'd actually need to switch to keyword arguments to achieve the input simplification you're after. Examples showing that it can be done with positional arguments:

In [1]: def f(a, b):
   ...:     print(a, b)
   ...:

In [2]: f(b=2, a=1)
1 2

In [3]: d = dict(a=1, b=2)

In [4]: f(**d)
1 2

In [5]: t = (1, 2)

In [6]: f(*t)
1 2

The test simplification could work with a tuple too:

assert (1., 5.) == pytest.approx((1., 5.))

mikofski · 2020-04-27T22:49:39Z

Will you explained what I was trying to say, but much better, thanks!

Peque · 2020-04-28T18:50:51Z

@mikofski I have no strong opinion on packing/unpacking parameters. 😊

@wholmgren Yeah, tests simplification can be done either way, I should not have brought it to this issue. 👍

@wholmgren In my opinion positional args should be enforced then. That is possible in Python 3.8, but not before. Also, positional arguments must come before keyword arguments, and that may require further API modifications.

In example:

# Assuming `cec_params` is a tuple...
cec_params = fit_sdm_cec_sam(...)
# This works fine
calcparams_cec(
    1000,
    25,
    0.004,
    2.5,
    *cec_params,
    ...,
)
# This does not
calcparams_cec(
    effective_irradiance=1000,
    temp_cell=25,
    alpha_sc=0.004,
    a_ref=2.5,
    *cec_params,
    ...,
)

You can't just use keyword parameters before the cec_params expansion and, in my opinion, that makes the code less legible/explicit and the API less convenient/friendly to use. So, in my opinion it would be better to change the API to have *cec_params first, and then effective_irradiance, temp_cell, ....

To summarize, I think returning a dict is better and I do not mind that much about params=dictionary vs. **dictionary on function calls. But your project, your rules! Feel free to close this. 😜

adriesse · 2020-04-28T19:06:38Z

A worthy discussion.

mikofski · 2020-04-28T22:35:15Z

ICYMI: @kanderso-nrel and @CameronTStark - could be useful to get your opinions on this, if you care to weigh in.

kandersolar · 2020-04-28T23:46:29Z

On the subject of passing groups of values into a function as a unit vs individually -- if the values are coherent enough to interact as a group in several parts of the API, I wonder if they would be better grouped as a simple object like scipy's OptimizeResult instead of a dictionary.

For positional-only arguments, I don't think it's a good idea at this time. For one, it would require a big bump in minimum python version, and for two I don't think pvlib satisfies any of the use cases listed in PEP 570.

For returning dicts vs tuples... a 6-element tuple is a bit unwieldy. On the other hand, returning a dictionary for two values would seem silly. I notice fit_sdm_desoto returns a dictionary. But taking a step back, it seems like there will always be some complexity to the plumbing of these low-level interfaces, which suggests maybe they need a high-level interface. Just throwing out an idea, what about some sort of "PVDevice" class that behaves like this?

device = fit_device(v_mp, i_mp, ... model='cec')
# or
device = Device(I_L_ref, ..., model='cec')
# calls calcparams_x and singlediode under the hood:
curve_info = device.ivcurve(effective_irradiance, ...)

CameronTStark · 2020-05-04T23:50:13Z

@wholmgren 's description of the situation is good although I would prefer to have a dictionary on the output if there are more than two values being returned. This makes it possible to enable easy transfer of the output of one function into another without careful positioning.

In [1]: def outputs_dictionary():
         ...:     return dict(this="foo", that="bar", other_thing="baz")

In [2]: def uses_output_dictionary(first_arg, that, other_thing, this, last_arg):
         ...:     print(this, that, other_thing)

In [3]: example_dictionary = outputs_dictionary()

In [4]: example_dictionary
Out[4]: {'this': 'foo', 'that': 'bar', 'other_thing': 'baz'}

In [5]: uses_output_dictionary(42, **example_dictionary, last_arg=1337)
foo bar baz

In [6]: uses_output_dictionary(last_arg=1337, **example_dictionary, first_arg=42)
foo bar baz

This is useful in interactive data analysis as well as in programmatic scripting to unify the API. Having sensible defaults for these arguments, when possible, makes this even easier and is also useful in enabling new users a path forward when they'd otherwise have no idea what value to insert.

wholmgren · 2020-05-05T02:34:41Z

I don't think we're going to get any farther arguing in the abstract. Can the proponents of an API change provide a list of functions for which they think it should change? Are we only talking about the single diode functions?

While we're at it, we might also consider functions that return labeled data (either as DataFrames or dicts) that could simply be ordered instead.

Having sensible defaults for these arguments, when possible, makes this even easier and is also useful in enabling new users a path forward when they'd otherwise have no idea what value to insert.

I'm not aware of any pvlib functions that require input for which there exists reasonable defaults, but they exist then we should address them in separate issues.

cwhanse · 2020-05-05T02:55:06Z

I'm late to this conversation, but I lean against the suggestion to return dicts instead of admittedly long tuples. Two reasons, which have already been mentioned:

a dict requires users to conform to pvlib's key names, which almost certainly don't match the argument names in other packages. Returning tuples lets the user name the quantities returned by the pvlib function.
for the function layer, pvlib prefers to use one argument for each quantity being input to the model implemented. I think the same preference is useful for function output.

I do strongly favor changing the API so that positional arguments are in a consistent order - looking at the single diode functions in particular.

Peque · 2020-05-05T17:47:57Z

It is expected that pvlib's key names do not match other packages'. Not a huge hassle in my opinion to handle it with dicts vs. tuples (I assume other packages will have different order in their parameters as well, of course):

# Returning dicts
d = pvlib_function()
non_pvlib_function(x=d['bar'], y=d['foo'], z=d['baz'])

# Returning tuples
foo, bar, baz = pvlib_function()
non_pvlib_function(x=bar, y=foo, z=baz)

On the other hand, as an end-user, I would be much more scared if I had to rely on the order.

Of course I would assume that pvlib would take care of updating other pvlib functions if the order changed in the future, so this will always work:

t = pvlib_function()
another_pvlib_function(*t, ...)

But any change in the order in the future, for any reason, will break my code without raising an exception if tuples are used instead of dicts:

foo, bar, baz = pvlib_function()  # Now this should be "bar, baz, foo" instead
non_pvlib_function(x=bar, y=foo, z=baz)  # Silently yields wrong results

So, as an end user, I would vote against tuples. I would prefer my code to crash blatantly if the API changed in the future.

On the other hand, if what you are proposing is to freeze the API once this is decided (i.e.: the order will never change no matter what), then I take everything I just said back. I just don't know how stable you consider pvlib's API to be. 😊

adriesse · 2020-05-05T19:21:12Z

I second @wholmgren on this:

I don't think we're going to get any farther arguing in the abstract. Can the proponents of an API change provide a list of functions for which they think it should change?

Peque · 2020-05-05T20:28:54Z

My proposal was about the SDM functions in particular. Since my pvlib's API knowledge is very limited, I do not know if it should extend to other parts of the API, so I would feel unable to provide a list of functions.

I hope this issue ends up not causing more noise than it helps. ❤️

cwhanse · 2020-05-07T15:34:56Z

@Peque please take a look at the changes in #708, and let me know how we can improve the fit functions to make a better interface with the pvlib.pvsystem.calcparams_xxx functions.

For the record, I regret that I did not first reorganize pvlib.ivtools in a separate PR, rather than as part of the port of a fitting method from MATLAB.

Peque · 2020-05-10T12:56:20Z

@cwhanse 👍

In my opinion this could be improved:

fit_cec_sam and fit_sandia_simple return parameters in a tuple, but fit_desoto, fit_pvsyst_sandia, fit_desoto_sandia return parameters in a dict; this should be consistent (either dict or tuple)
If the fit_ functions are going to return tuples, that part of the API should be frozen forever (i.e.: the order should never change)
If the fit_ functions are going to return tuples, the calcparams_ functions should have the fit_-returned parameters first in the function definition and enforced as positional parameters when only Python 3.8+ is supported in the future; this part of the API should be also frozen

Minor suggestions:

I noted an inconsistency, already reported in Inconsistent parameter name in fit_sdm_cec_sam()? #958
There are inconsistencies in the use of upper/lower case in function parameters (i.e.: v_oc, EgRef, I_L_ref, I_o_ref, Adjust...) I would always use lower case for function parameters, with _ for separation (you could use flake8's pep8-naming plugin to check that)
R_s does not seem to have a _ref suffix, even if it is supposed to be the resistance at reference conditions; maybe those _ref suffixes could be always omitted to simplify naming?

adriesse · 2020-05-12T13:22:45Z

If all the SDM related functions become compatible with each other with just a few changes, I'm in favor.

As much as I like to see a layer of functions for everything at the base of pvlib (and as much as I like to limit myself to using only that layer) I do share the view of @kanderso-nrel that classes could provide a nice solution here.

mikofski · 2020-05-13T18:07:48Z

I don't really have a strong opinion here, and there are already a lot of voices, but I will just mention this: beware of classes. They can be useful if used thoughtfully, but they

can take up a lot of memory,
can be slow to create,
and may not play friendly with concurrency and task graphs.

As usually I may be completely wrong about this but I have seen otherwise. PVMismatch is a case in point. With cells as a "class" we have to do a lot of secret magic to keep the memory load down. This might be less of an issue if the cells' info was all in one giant NumPy array or other efficient memory container.

CameronTStark added the api label Apr 27, 2020

CameronTStark added this to the 0.7.3 milestone Apr 27, 2020

CameronTStark modified the milestones: 0.7.3, 0.8.0 May 4, 2020

wholmgren modified the milestones: 0.8.0, 0.9.0 Aug 28, 2020

wholmgren modified the milestones: 0.9.0, 0.10.0 May 15, 2021

kandersolar mentioned this issue May 11, 2022

"Numpy-isation" of irradiance decomposition functions #1455

Open

kandersolar mentioned this issue May 26, 2022

Data shift algorithm pvlib/pvanalytics#142

Merged

6 tasks

kandersolar removed this from the 0.10.0 milestone May 16, 2023

kandersolar added this to the Someday milestone May 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return dictionaries instead of tuples #959

Return dictionaries instead of tuples #959

Peque commented Apr 26, 2020 •

edited

Loading

kandersolar commented Apr 27, 2020

Peque commented Apr 27, 2020

mikofski commented Apr 27, 2020

wholmgren commented Apr 27, 2020

mikofski commented Apr 27, 2020

Peque commented Apr 28, 2020 •

edited

Loading

adriesse commented Apr 28, 2020

mikofski commented Apr 28, 2020

kandersolar commented Apr 28, 2020

CameronTStark commented May 4, 2020

wholmgren commented May 5, 2020

cwhanse commented May 5, 2020 •

edited

Loading

Peque commented May 5, 2020

adriesse commented May 5, 2020

Peque commented May 5, 2020

cwhanse commented May 7, 2020

Peque commented May 10, 2020

adriesse commented May 12, 2020

mikofski commented May 13, 2020

Return dictionaries instead of tuples #959

Return dictionaries instead of tuples #959

Comments

Peque commented Apr 26, 2020 • edited Loading

kandersolar commented Apr 27, 2020

Peque commented Apr 27, 2020

mikofski commented Apr 27, 2020

wholmgren commented Apr 27, 2020

mikofski commented Apr 27, 2020

Peque commented Apr 28, 2020 • edited Loading

adriesse commented Apr 28, 2020

mikofski commented Apr 28, 2020

kandersolar commented Apr 28, 2020

CameronTStark commented May 4, 2020

wholmgren commented May 5, 2020

cwhanse commented May 5, 2020 • edited Loading

Peque commented May 5, 2020

adriesse commented May 5, 2020

Peque commented May 5, 2020

cwhanse commented May 7, 2020

Peque commented May 10, 2020

adriesse commented May 12, 2020

mikofski commented May 13, 2020

Peque commented Apr 26, 2020 •

edited

Loading

Peque commented Apr 28, 2020 •

edited

Loading

cwhanse commented May 5, 2020 •

edited

Loading