Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return dictionaries instead of tuples #959

Open
Peque opened this issue Apr 26, 2020 · 19 comments
Open

Return dictionaries instead of tuples #959

Peque opened this issue Apr 26, 2020 · 19 comments
Labels
Milestone

Comments

@Peque
Copy link
Contributor

Peque commented Apr 26, 2020

I found it a bit inconvenient to have fit_sdm_cec_sam() return a tuple. I think returning a dictionary with ['I_L_ref', 'I_o_ref', 'R_sh_ref', 'R_s', 'a_ref', 'Adjust'] keys would be much more convenient and safe.

The same goes for calcparams_cec(), which could return a dictionary with ['photocurrent', 'saturation_current', 'resistance_series', 'resistance_shunt', 'nNsVth'] keys instead.

I think that is more Pythonic (explicit is better than implicit and readability counts).

Also, more convenient API-wise:

cec_parameters = fit_sdm_cec_sam(...)
sd_parameters = calcparams_cec(..., **cec_parameters)
curve_info = singlediode(..., **sd_parameters)

The tests could be simplified as well. They currently look like:

I_L_ref, I_o_ref, R_sh_ref, R_s, a_ref, Adjust = ivtools.fit_sdm_cec_sam(...)
expected = pd.Series(get_cec_params_cansol_cs5p_220p['output'])
modeled = pd.Series(index=expected.index, data=np.nan)
modeled['a_ref'] = a_ref
modeled['I_L_ref'] = I_L_ref
modeled['I_o_ref'] = I_o_ref
modeled['R_sh_ref'] = R_sh_ref
modeled['R_s'] = R_s
modeled['Adjust'] = Adjust
assert np.allclose(modeled.values, expected.values, rtol=5e-2)

They could look like:

modeled = ivtools.fit_sdm_cec_sam(...)
expected = get_cec_params_cansol_cs5p_220p['output']
assert modeled == pytest.approx(expected, rel=5e-2)
@kandersolar
Copy link
Member

For reference, tuples vs dicts was discussed briefly (with no immediate resolution) in this thread: #784 (comment)

@Peque
Copy link
Contributor Author

Peque commented Apr 27, 2020

PD: Since what I am proposing is a breaking change... I also think that reducing the number of parameters in a function is good (i.e.: curve_info = singlediode(..., sd_parameters=sd_parameters)), where sd_parameters is a variable containing all the SDM parameters. Although this could also be applied with a tuple instead of a dict.

@mikofski
Copy link
Member

My preference would be to use keyword arguments instead, eg: curve_info = singlediode(..., a, b, c, d) = singlediode(..., **sd_parameters) where sd_parameters = {'a': 12.3, 'b': 4.56, 'c': 0.789, 'd': False}. I believe there are a couple of reasons why this could be more Pythonic:

  1. The keyword arguments are explicit. If an argument is misspelled or missing, the function will raise an exception
  2. Whereas I believe a dictionary argument would be implicit. It requires documentation to tell the user what the required keys should be. This is potentially error prone, one could mistype a key or miss one.
  3. Using keywords allows the user all options, pass the dictionary using splat, as keywords, or positional.

@CameronTStark CameronTStark added this to the 0.7.3 milestone Apr 27, 2020
@wholmgren
Copy link
Member

In my experience maintaining pvlib, it's been easier to deal with things that require order (positional args and returned tuples) than things that require labels (keyword args and returned dicts or DataFrames). I do use a lot of dicts, Series, and DataFrames in my work, so I certainly appreciate their general value. It just seems they're harder for a more general purpose library. As I mentioned in the linked issue, I'd really like to read a blog post about this kind of API design question in scientific computing.

It's also not clear to me that you'd actually need to switch to keyword arguments to achieve the input simplification you're after. Examples showing that it can be done with positional arguments:

In [1]: def f(a, b):
   ...:     print(a, b)
   ...:

In [2]: f(b=2, a=1)
1 2

In [3]: d = dict(a=1, b=2)

In [4]: f(**d)
1 2

In [5]: t = (1, 2)

In [6]: f(*t)
1 2

The test simplification could work with a tuple too:

assert (1., 5.) == pytest.approx((1., 5.))

@mikofski
Copy link
Member

Will you explained what I was trying to say, but much better, thanks!

@Peque
Copy link
Contributor Author

Peque commented Apr 28, 2020

@mikofski I have no strong opinion on packing/unpacking parameters. 😊

@wholmgren Yeah, tests simplification can be done either way, I should not have brought it to this issue. 👍

@wholmgren In my opinion positional args should be enforced then. That is possible in Python 3.8, but not before. Also, positional arguments must come before keyword arguments, and that may require further API modifications.

In example:

# Assuming `cec_params` is a tuple...
cec_params = fit_sdm_cec_sam(...)
# This works fine
calcparams_cec(
    1000,
    25,
    0.004,
    2.5,
    *cec_params,
    ...,
)
# This does not
calcparams_cec(
    effective_irradiance=1000,
    temp_cell=25,
    alpha_sc=0.004,
    a_ref=2.5,
    *cec_params,
    ...,
)

You can't just use keyword parameters before the cec_params expansion and, in my opinion, that makes the code less legible/explicit and the API less convenient/friendly to use. So, in my opinion it would be better to change the API to have *cec_params first, and then effective_irradiance, temp_cell, ....

To summarize, I think returning a dict is better and I do not mind that much about params=dictionary vs. **dictionary on function calls. But your project, your rules! Feel free to close this. 😜

@adriesse
Copy link
Member

A worthy discussion.

@mikofski
Copy link
Member

ICYMI: @kanderso-nrel and @CameronTStark - could be useful to get your opinions on this, if you care to weigh in.

@kandersolar
Copy link
Member

On the subject of passing groups of values into a function as a unit vs individually -- if the values are coherent enough to interact as a group in several parts of the API, I wonder if they would be better grouped as a simple object like scipy's OptimizeResult instead of a dictionary.

For positional-only arguments, I don't think it's a good idea at this time. For one, it would require a big bump in minimum python version, and for two I don't think pvlib satisfies any of the use cases listed in PEP 570.

For returning dicts vs tuples... a 6-element tuple is a bit unwieldy. On the other hand, returning a dictionary for two values would seem silly. I notice fit_sdm_desoto returns a dictionary. But taking a step back, it seems like there will always be some complexity to the plumbing of these low-level interfaces, which suggests maybe they need a high-level interface. Just throwing out an idea, what about some sort of "PVDevice" class that behaves like this?

device = fit_device(v_mp, i_mp, ... model='cec')
# or
device = Device(I_L_ref, ..., model='cec')
# calls calcparams_x and singlediode under the hood:
curve_info = device.ivcurve(effective_irradiance, ...) 

@CameronTStark CameronTStark modified the milestones: 0.7.3, 0.8.0 May 4, 2020
@CameronTStark
Copy link
Contributor

@wholmgren 's description of the situation is good although I would prefer to have a dictionary on the output if there are more than two values being returned. This makes it possible to enable easy transfer of the output of one function into another without careful positioning.

In [1]: def outputs_dictionary():
         ...:     return dict(this="foo", that="bar", other_thing="baz")

In [2]: def uses_output_dictionary(first_arg, that, other_thing, this, last_arg):
         ...:     print(this, that, other_thing)

In [3]: example_dictionary = outputs_dictionary()

In [4]: example_dictionary
Out[4]: {'this': 'foo', 'that': 'bar', 'other_thing': 'baz'}

In [5]: uses_output_dictionary(42, **example_dictionary, last_arg=1337)
foo bar baz

In [6]: uses_output_dictionary(last_arg=1337, **example_dictionary, first_arg=42)
foo bar baz

This is useful in interactive data analysis as well as in programmatic scripting to unify the API. Having sensible defaults for these arguments, when possible, makes this even easier and is also useful in enabling new users a path forward when they'd otherwise have no idea what value to insert.

@wholmgren
Copy link
Member

I don't think we're going to get any farther arguing in the abstract. Can the proponents of an API change provide a list of functions for which they think it should change? Are we only talking about the single diode functions?

While we're at it, we might also consider functions that return labeled data (either as DataFrames or dicts) that could simply be ordered instead.

Having sensible defaults for these arguments, when possible, makes this even easier and is also useful in enabling new users a path forward when they'd otherwise have no idea what value to insert.

I'm not aware of any pvlib functions that require input for which there exists reasonable defaults, but they exist then we should address them in separate issues.

@cwhanse
Copy link
Member

cwhanse commented May 5, 2020

I'm late to this conversation, but I lean against the suggestion to return dicts instead of admittedly long tuples. Two reasons, which have already been mentioned:

  • a dict requires users to conform to pvlib's key names, which almost certainly don't match the argument names in other packages. Returning tuples lets the user name the quantities returned by the pvlib function.
  • for the function layer, pvlib prefers to use one argument for each quantity being input to the model implemented. I think the same preference is useful for function output.

I do strongly favor changing the API so that positional arguments are in a consistent order - looking at the single diode functions in particular.

@Peque
Copy link
Contributor Author

Peque commented May 5, 2020

It is expected that pvlib's key names do not match other packages'. Not a huge hassle in my opinion to handle it with dicts vs. tuples (I assume other packages will have different order in their parameters as well, of course):

# Returning dicts
d = pvlib_function()
non_pvlib_function(x=d['bar'], y=d['foo'], z=d['baz'])

# Returning tuples
foo, bar, baz = pvlib_function()
non_pvlib_function(x=bar, y=foo, z=baz)

On the other hand, as an end-user, I would be much more scared if I had to rely on the order.

Of course I would assume that pvlib would take care of updating other pvlib functions if the order changed in the future, so this will always work:

t = pvlib_function()
another_pvlib_function(*t, ...)

But any change in the order in the future, for any reason, will break my code without raising an exception if tuples are used instead of dicts:

foo, bar, baz = pvlib_function()  # Now this should be "bar, baz, foo" instead
non_pvlib_function(x=bar, y=foo, z=baz)  # Silently yields wrong results

So, as an end user, I would vote against tuples. I would prefer my code to crash blatantly if the API changed in the future.

On the other hand, if what you are proposing is to freeze the API once this is decided (i.e.: the order will never change no matter what), then I take everything I just said back. I just don't know how stable you consider pvlib's API to be. 😊

@adriesse
Copy link
Member

adriesse commented May 5, 2020

I second @wholmgren on this:

I don't think we're going to get any farther arguing in the abstract. Can the proponents of an API change provide a list of functions for which they think it should change?

@Peque
Copy link
Contributor Author

Peque commented May 5, 2020

My proposal was about the SDM functions in particular. Since my pvlib's API knowledge is very limited, I do not know if it should extend to other parts of the API, so I would feel unable to provide a list of functions.

I hope this issue ends up not causing more noise than it helps. ❤️

@cwhanse
Copy link
Member

cwhanse commented May 7, 2020

@Peque please take a look at the changes in #708, and let me know how we can improve the fit functions to make a better interface with the pvlib.pvsystem.calcparams_xxx functions.

For the record, I regret that I did not first reorganize pvlib.ivtools in a separate PR, rather than as part of the port of a fitting method from MATLAB.

@Peque
Copy link
Contributor Author

Peque commented May 10, 2020

@cwhanse 👍

In my opinion this could be improved:

  • fit_cec_sam and fit_sandia_simple return parameters in a tuple, but fit_desoto, fit_pvsyst_sandia, fit_desoto_sandia return parameters in a dict; this should be consistent (either dict or tuple)
  • If the fit_ functions are going to return tuples, that part of the API should be frozen forever (i.e.: the order should never change)
  • If the fit_ functions are going to return tuples, the calcparams_ functions should have the fit_-returned parameters first in the function definition and enforced as positional parameters when only Python 3.8+ is supported in the future; this part of the API should be also frozen

Minor suggestions:

  • I noted an inconsistency, already reported in Inconsistent parameter name in fit_sdm_cec_sam()? #958
  • There are inconsistencies in the use of upper/lower case in function parameters (i.e.: v_oc, EgRef, I_L_ref, I_o_ref, Adjust...) I would always use lower case for function parameters, with _ for separation (you could use flake8's pep8-naming plugin to check that)
  • R_s does not seem to have a _ref suffix, even if it is supposed to be the resistance at reference conditions; maybe those _ref suffixes could be always omitted to simplify naming?

@adriesse
Copy link
Member

If all the SDM related functions become compatible with each other with just a few changes, I'm in favor.

As much as I like to see a layer of functions for everything at the base of pvlib (and as much as I like to limit myself to using only that layer) I do share the view of @kanderso-nrel that classes could provide a nice solution here.

@mikofski
Copy link
Member

I don't really have a strong opinion here, and there are already a lot of voices, but I will just mention this: beware of classes. They can be useful if used thoughtfully, but they

  • can take up a lot of memory,
  • can be slow to create,
  • and may not play friendly with concurrency and task graphs.

As usually I may be completely wrong about this but I have seen otherwise. PVMismatch is a case in point. With cells as a "class" we have to do a lot of secret magic to keep the memory load down. This might be less of an issue if the cells' info was all in one giant NumPy array or other efficient memory container.

@kandersolar kandersolar added this to the Someday milestone May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants