Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subsection on multiple observation model comparison #27

Merged
merged 3 commits into from
Dec 1, 2023
Merged

Add subsection on multiple observation model comparison #27

merged 3 commits into from
Dec 1, 2023

Conversation

OriolAbril
Copy link
Member

As discussed in #14, this PR adds a new subsection to model comparison section. It is based on the rugby model but applied to premier league data. I still have some doubts about the best way to organize the notebook and what to add to make it clear and accessible. Here are some of them, but feel free to add extra suggestions:

  1. How should the explanation on the different questions be tackled? There will surely be the ArviZ approach of concatenating/adding pointwise log likelihood values (this is already in the notebook), what else should accompany these calculations:
    • Conceptual explanation on mimicking cross validation leaving one match/goal recording/team out
    • Mathematical reparametrization of the log likelihood in terms of the desired "unit observation". See discourse clarification
    • Auxiliar PyMC model implementing the reparametrization above. See this notebook
  2. I think this should be only about IC calculation and refer to model comparison subsection once IC calculation is obtained, but I'm not 100% sure about this. Would it make sense to compare to negative binomial for example?
  3. I need help with terminology 😅 leave one team out and leave one match out I think are ok, but leave half match out? I used observation in discourse but I think it was a bad idea

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

@AlexAndorra
Copy link

AlexAndorra commented Jun 8, 2020

Thanks Oriol, this is a very good start! Regarding your questions:

  1. I agree with your first point -- this should really be in the NB, on the model of the explanation you wrote in the rugby NB for Discourse. More generally, a thorough conceptual explanation of the model is always welcome and will help with understanding the subsequent CVs.
    Your two other points seem less necessary to include in the NB to me -- or maybe just an explanation of the reparametrization of the log likelihood in terms of the desired unit observation, but maybe not the implementation of the models.

  2. I agree with you. Neg Binomial sounds good, but maybe also a hierarchical model? I think there are interesting clusters of teams here.

  3. I understood it was leave one match / goal / team out; I didn't see leave half a match out 🤔 More generally, what would be useful is explaining the concept of "leave one X out" -- why would we need to do that?

Finally, I think this would benefit from prior and posterior predictive checks, as well as posterior plots and trace plots. I'm very happy to contribute these parts, although I don't think I'll have time this week 😬

@OriolAbril
Copy link
Member Author

OriolAbril commented Jun 8, 2020

I will get to watermark and sorting imports eventually 😄 , for now I prefer to focus on the content. i would like to eventually run it on pymc3.9 and latest ArviZ

  1. I think that the discussion on the exchangeability condition should be present here to better serve the goal of the repo, I am not sure this is possible without the mathematical description of the likelihood nor the reparametrization example.

  2. I think this should be focused on information criteria and how to use them with multiple observations and hierarchical models, adding one extra model to perform the comparison all the way could serve that goal, but I don't want to get lost in a full model comparison and iterative improvement of the model.

  3. goal/observation/"half match" could all be used but I think that all of them can still be confusing 😕

Finally, I think this would benefit from prior and posterior predictive checks, as well as posterior plots and trace plots. I'm very happy to contribute these parts, although I don't think I'll have time this week

As I said above, not sure this fits the goal of the section, there will be other sections on ppc, trace plots... We could look into making this a complete case study (and even extend the data to several years or several countries at the same time) but that would be a whole different story, I think this should be laser focused on ic plus multiple observations (this is why I am not even sure about model comparison, there already is a subsection on model comparison and I don't want to be too repetitive)

@AlexAndorra
Copy link

AlexAndorra commented Jun 9, 2020

I think that the discussion on the exchangeability condition should be present here to better serve the goal of the repo, I am not sure this is possible without the mathematical description of the likelihood nor the reparametrization example.

I always like when there is more details to explain a method than less, but I'm trying to find the "least long" explanation possible: the discussion on exchangeability is necessary, and if you think the mathematical description of the likelihood is too to bring the point home, then let's add it too. And I guess while writing this we will see whether the reparametrization examples should be added too -- to be clear, I think they are of value, but they take a lot of place; it's a trade-off.

I think this should be focused on information criteria and how to use them with multiple observations and hierarchical models, adding one extra model to perform the comparison all the way could serve that goal, but I don't want to get lost in a full model comparison and iterative improvement of the model.

I see what you mean. A good idea could be to use this model and show how to improve it across various NBs of this repo, each NB being focused on one aspect of the iterative process. This would constitute a complete case-study, as you say, and I think this model is interesting because there are natural hierarchical structures (plus, it's not the Iris or Titanic datasets 😜 ).
I understand the need to be focused on IC, but I think we shouldn't give people the impression that you can use IC in a vacuum -- you use them in a specific part of the modeling worfklow, so putting them in context seems important to me.

goal/observation/"half match" could all be used but I think that all of them can still be confusing

The last two are confusing, but the first one is quite clear to me: what's the expected predictive accuracy of the models if we were trying to predict which team will score the next goal? But maybe I misunderstood what "leave one goal out" means 😆

@OriolAbril
Copy link
Member Author

goal/observation/"half match" could all be used but I think that all of them can still be confusing

The last two are confusing, but the first one is quite clear to me: what's the expected predictive accuracy of the models if we were trying to predict which team will score the next goal? But maybe I misunderstood what "leave one goal out" means

all 3 are definitely confusing 😅, we are assessing the predictive accuracy of the model if we were predicting how many goals would one of the teams score in the match without constraining the team to be home or away. In match A vs B we want to predict how many goals will score A and how many will score B independently. It is similar to the match case so it makes sense the result is similar but there is a very important difference here our predictions are "easy" (scalar) but we have twice as many observations to predict whereas in match case our predictions are "hard" (we want to guess not one but 2 values) but we have less observations to predict. I don't know how to explain this better, which is why I think either math or implementation of what I mean will help. m_goals here is the model corresponding to this confusing case

I also realized that we may want to keep this backend agnostic? @aloctavodia @canyon289 I guess that would decide towards loading the inference data from netcdf and stick to the math

@canyon289
Copy link
Member

canyon289 commented Jun 10, 2020 via email

@AlexAndorra
Copy link

Thanks Oriol, this makes complete sense 👌
I'm torn honestly: showing the different implementations with the models is super useful pedagogically, but it takes a lot of place and isn't platform-agnostic -- which is pretty much what I was saying in my last comment actually 😆
In a perfect world, I would include them and the math. Let's wait for Osvaldo's and Ravin's opinion.

@canyon289
Copy link
Member

To your question about splitting this out.

I think this should be only about IC calculation and refer to model comparison subsection once IC calculation is obtained, but I'm not 100% sure about this. Would it make sense to compare to negative binomial for example?

I would agree that explaining IC calculation split from comparison seems to make sense? As far as the code goes its a great example of how to do model checks with multiple likelihoods.

As far as soccer/football goes I don't know what a half match is, so I lack the domain knowledge to know whats going on.

I like the direction this is going, am interested to see the text that explains what is going on!

@OriolAbril
Copy link
Member Author

I have extended the content with some math explanation and alternative implementations, now there is actually content to review, I am still not sure about what to include and what to exclude but most of the content was already written I basically had to gather the pieces and put them together. I think it will be clear now, most of the work should go into making this both clear and concise, it is still a little caothic.

@AlexAndorra
Copy link

Thanks Oriol, just skimmed through it and it looks awesome! Here is a first batch of comments.
I stopped at "Information criterion calculation" and will come back to it later today 😉

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 12, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-06-12T10:00:17Z
----------------------------------------------------------------

I wonder if "multiple-likelihoods models" isn't less confusing than "multi-observation models". I fear the latter could be interpreted as "you have multiple data points", not as "you have multiple likelihood distributions"

OriolAbril commented on 2020-06-12T14:08:42Z
----------------------------------------------------------------

Good point, I'll change this. I will probably agree with any proposal to reduce the number of times observation is used, I think it is ambiguous in this context.

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 12, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-06-12T10:00:18Z
----------------------------------------------------------------

  • Nitpicking: "poisson distribution" should be with a capital "P" ;)
  • yh and ya are not indexed by team here, so does this means each y_g,h is also actually a vector? I.e; y_1, h contains all the goals scored by the 10 teams who played at home on the first game of the season? If yes, I this should be specified above.
  • "... each team's scoring and defensive power..."
  • "on the attacking power of the home tealm and on the defensive power of the away team."
  • Regarding your note: I think I'd rather say "The expected number of goals scored by the home team (theta_g,h)" -- at least the first time we introduce this model, to be explicit.
  • Also, I'd call the intercept "alpha" rather than "beta_0", but that's just a personal preference.

OriolAbril commented on 2020-06-12T14:30:17Z
----------------------------------------------------------------

In the league there are 20 teams which means that there are 38 match days (they play twice against each other except themselves) and there are 10 matches every day. g index indicates the match id (not the day) out of n=380 matches . Therefore y_g,h are scalars. In fact we have no info at all about the match days in this model.

AlexAndorra commented on 2020-06-13T14:15:12Z
----------------------------------------------------------------

Ok, much clearer, thanks!

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 12, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-06-12T10:00:18Z
----------------------------------------------------------------

  • "... is the following one:"
  • I think the parametrization with atts_star should be explained. Also, is there a more informative name than atts_star?

OriolAbril commented on 2020-06-12T14:31:46Z
----------------------------------------------------------------

I'd defer readers interested in that to go to the source of the model (I have in mind to add the sources on top like in the pymc3 example notebook).

AlexAndorra commented on 2020-06-13T14:15:49Z
----------------------------------------------------------------

Ow, there is a source for this model?? I'm interested!

OriolAbril commented on 2020-06-13T17:12:39Z
----------------------------------------------------------------

I used the code in pymc3 rugby example which in turn is based in premier example http://danielweitzenfeld.github.io/passtheroc/blog/2014/10/28/bayes-premier-league/

Copy link
Member Author

Good point, I'll change this. I will probably agree with any proposal to reduce the number of times observation is used, I think it is ambiguous in this context.


View entire conversation on ReviewNB

Copy link
Member Author

In the league there are 20 teams which means that there are 38 match days (they play twice against each other except themselves) and there are 10 matches every day. g index indicates the match id (not the day) out of n=380 matches . Therefore y_g,h are scalars. In fact we have no info at all about the match days in this model.


View entire conversation on ReviewNB

Copy link
Member Author

I'd defer readers interested in that to go to the source of the model (I have in mind to add the sources on top like in the pymc3 example notebook).


View entire conversation on ReviewNB

@aloctavodia
Copy link
Contributor

I still not have the time to review this, but as a general comment about this repository motivated by question 1. I think each notebook should focus on a single topic (and we can go as granular as we want), if we want to show a more complete "bayesian worflow" we should have a dedicated notebook (or notebooks) to do that. And we should try to discuss as much theory as possible (always keeping in mind the theoretical elements that are useful for the applications). As not every potential user of this repo will be interested in "going to deep" on the theoretical side we may have a "in depth section" per notebook. in fact a few of the already available notebook have it. I am not saying we must follow this pattern in every notebook, but we can use if necessary or desired. Another (not mutually exclusive) option is to have a few notebooks more theoretical and another more practical.

Copy link

Ok, much clearer, thanks!


View entire conversation on ReviewNB

Copy link

Ow, there is a source for this model?? I'm interested!


View entire conversation on ReviewNB

@AlexAndorra
Copy link

Thanks @OriolAbril, this is really nice now! The explanation are very clear and the examples easy to follow 👏
I added some comments and suggested some changes below. Happy to discuss with you and implement them if you want / don't have time 🙂

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 13, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-06-13T15:34:04Z
----------------------------------------------------------------

  • "Due to the presence of the two observations likelihoods in our model ..." (to be consistent with my first comment)
  • "... travel to the match were where our team will score... "
  • Really love the explanation in the bullet points!
  • "... this model predicts the number of scored goals scored. Its results can be used to estimate probabilities of victory and other derived quantities, but calculating the likelihood of these derived quantities may not be straighforward. And as you can see above, there isn't one unique predictive task: it all depends on your domain knowledge and scientific question you're interested in. As often in statistics, the answer to these questions lies outside the model -- you tell the model what to do, not the other way around."
  • "to show how would this kind of tasks be performed with ArviZ. But let's see what ArviZ says when you naively ask it for the LOO of a multiple-likelihoods model:"

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 13, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-06-13T15:34:05Z
----------------------------------------------------------------

I love making ArviZ fail on purpose, very pedagogical. Here is what I'd write just after that cell, to expand on what you already wrote:

"The error message is quite clear: ArviZ doesn't know what to do with the several likelihoods it found. This is because this information is neither in the model nor in the data, as we said above: we need to tell ArviZ what we're interested in. In other words, we are the boss here, and the model needs us more than we need it!"


OriolAbril commented on 2020-06-13T17:00:53Z
----------------------------------------------------------------

I would say one goal of the notebook is understanding why ArviZ fails with multiple likelihoods, however the error may be read by users who have not read the notebook, it is in this case that I am not sure the error provides any useful info.

AlexAndorra commented on 2020-06-14T14:41:43Z
----------------------------------------------------------------

Ow ok, outside of this NB context you mean. Maybe we can add a link to this NB in the error message, once the NB is merged?

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 13, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-06-13T15:34:06Z
----------------------------------------------------------------

I think we should show how to do it in the order of the examples you cited above. So here it should deal with away goals instead of home goals. Or we can just change the bullet point above.


OriolAbril commented on 2020-06-13T17:02:43Z
----------------------------------------------------------------

yeah, I'll change that. I had both cases, but the only difference is changing the home by away, so I figured it was not worth it and I don't know why I erased the away one 🤷

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 13, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-06-13T15:34:07Z
----------------------------------------------------------------

I'd modify the formulation a bit, to relate it more to what we just did. Something like:

"Actually, what we just did corresponds to another, specific implementation of our base model. But this time, we'll target our model to the specific task of predicting the goals scored by away teams. Notice how we do not throw away..."


OriolAbril commented on 2020-06-13T17:10:34Z
----------------------------------------------------------------

I am still not sure about adding the alternative implementations, I would prefer to keep the notebook backend agnostic (I think it will be possible once the updated rugby data is available in ArviZ). I already had them so I decided to include them in case they could help any of you clarify some doubts.

Moreover, I think it will be better to use some diagrams to highlight the observations we are interested in. I'll add them whenever I have time.

AlexAndorra commented on 2020-06-14T14:44:52Z
----------------------------------------------------------------

Diagrams are a great idea! And it's true that if we add diagrams, then the other implementations of the model are less necessary (although interesting to keep somewhere). That'd be great if we could hide/show cells in Jupyter NBs (like with the new InferenceData HTML repr). That way, we could hide the cells with the other models by default and users could look at them if needed.

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 13, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-06-13T15:34:08Z
----------------------------------------------------------------

  • Could use some spacing, as in the base model
  • Maybe just a few words about what the Potential does?

OriolAbril commented on 2020-06-13T17:15:20Z
----------------------------------------------------------------

I agree if we eventually add the code it should probably have some explanation or link to a description of pm.Potential, as I said above though, I am not sure about this being inside the scope of the notebook.

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 13, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-06-13T15:34:08Z
----------------------------------------------------------------

  • "Another option is being interested in the outcome of the whole matche -- this is the second example we talked about in our bullet points above."
  • " n our current model, the outcome of a match is not who wins or the aggregate of scored goals scored by both teams. The outcome is the goals scored by the home team and by the away team, both quantities at the same time. Here, the utility we're trying to maximize, as slightly nerdy football fans, is the number of pairs of goals -- matches that end up 3-3 or 4-4 are the ones that better fit our football tastes."

OriolAbril commented on 2020-06-13T17:29:15Z
----------------------------------------------------------------

I think the second bullet point is a little confusing/imprecise: what we are doing here is assessing the predictive accuracy of predicting the outcome of the whole match. If our ultimate goal is the one above (going to matches that have the most probability of ending 3-3 or 4-4) this strategy would be the one to use in order to get the predictive accuracy of the desired "observation" (if we had several models, we would compare them based on these loo values instead of away team predictive accuracy). But there are other cases where the interest lies in the whole outcome of the match, the 3-3 case is only one example. Other silly examples could be: wanting to go to the dullest match because you don't care much about the match and want to be able to talk during it (maybe you are bringing your significant other); having some kind of fetish for matches that end up 4-1; predicting the whole outcome of matches to bet on them...

In the same betting example, you may realize there is more money in guessing the goals of the away team than the goals of the home team and therefore you'd compare models with the away goals metric (like the supporters with low budget). I don't want to use betting examples though.

AlexAndorra commented on 2020-06-14T14:52:29Z
----------------------------------------------------------------

To be clear, I had already understood what you explicited above with the material already present in the NB. I like your point about changing the precise example, instead of taking the same as in the bullet point. That way, people will understand that 3-3 or 4-4 is just an example.

Something like:

"Here, the utility we're trying to maximize, as slightly nerdy football fans, is the pairs of goals -- matches that best fit our football tastes could be those that end up 4-4, or those with the lowest number of goals because we don't care much about the game and want to be able to talk during it, or those that end up with a precise score, like 6-2 (because we also love tennis)."

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-06-13T15:34:09Z
----------------------------------------------------------------

"As in our first example, this predictive task corresponds to a specific model that we could have written as follows in the first place:"


@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-06-13T15:34:10Z
----------------------------------------------------------------

  • "... being interested in the scored goals scored per match and per team."


Copy link
Member Author

I would say one goal of the notebook is understanding why ArviZ fails with multiple likelihoods, however the error may be read by users who have not read the notebook, it is in this case that I am not sure the error provides any useful info.


View entire conversation on ReviewNB

Copy link
Member Author

yeah, I'll change that. I had both cases, but the only difference is changing the home by away, so I figured it was not worth it and I don't know why I erased the away one 🤷


View entire conversation on ReviewNB

Copy link
Member Author

I am still not sure about adding the alternative implementations, I would prefer to keep the notebook backend agnostic (I think it will be possible once the updated rugby data is available in ArviZ). I already had them so I decided to include them in case they could help any of you clarify some doubts.

Moreover, I think it will be better to use some diagrams to highlight the observations we are interested in. I'll add them whenever I have time.


View entire conversation on ReviewNB

Copy link
Member Author

I used the code in pymc3 rugby example which in turn is based in premier example http://danielweitzenfeld.github.io/passtheroc/blog/2014/10/28/bayes-premier-league/


View entire conversation on ReviewNB

Copy link
Member Author

I agree if we eventually add the code it should probably have some explanation or link to a description of pm.Potential, as I said above though, I am not sure about this being inside the scope of the notebook.


View entire conversation on ReviewNB

Copy link
Member Author

I think the second bullet point is a little confusing/imprecise: what we are doing here is assessing the predictive accuracy of predicting the outcome of the whole match. If our ultimate goal is the one above (going to matches that have the most probability of ending 3-3 or 4-4) this strategy would be the one to use in order to get the predictive accuracy of the desired "observation" (if we had several models, we would compare them based on these loo values instead of away team predictive accuracy). But there are other cases where the interest lies in the whole outcome of the match, the 3-3 case is only one example. Other silly examples could be: wanting to go to the dullest match because you don't care much about the match and want to be able to talk during it (maybe you are bringing your significant other); having some kind of fetish for matches that end up 4-1; predicting the whole outcome of matches to bet on them...

In the same betting example, you may realize there is more money in guessing the goals of the away team than the goals of the home team and therefore you'd compare models with the away goals metric (like the supporters with low budget). I don't want to use betting examples though.


View entire conversation on ReviewNB

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

canyon289 commented on 2020-06-14T14:29:36Z
----------------------------------------------------------------

Reading through all the text you've added here and so far so good! Readable and understandable!


Copy link

Ow ok, outside of this NB context you mean. Maybe we can add a link to this NB in the error message, once the NB is merged?


View entire conversation on ReviewNB

Copy link

Diagrams are a great idea! And it's true that if we add diagrams, then the other implementations of the model are less necessary (although interesting to keep somewhere). That'd be great if we could hide/show cells in Jupyter NBs (like with the new InferenceData HTML repr). That way, we could hide the cells with the other models by default and users could look at them if needed.


View entire conversation on ReviewNB

Copy link

To be clear, I had already understood what you explicited above with the material already present in the NB. I like your point about changing the precise example, instead of taking the same as in the bullet point. That way, people will understand that 3-3 or 4-4 is just an example.

Something like:

"Here, the utility we're trying to maximize, as slightly nerdy football fans, is the pairs of goals -- matches that best fit our football tastes could be those that end up 4-4, or those with the lowest number of goals because we don't care much about the game and want to be able to talk during it, or those that end up with a precise score, like 6-2 (because we also love tennis)."


View entire conversation on ReviewNB

@OriolAbril
Copy link
Member Author

Here is a preview of the kind of diagrams I had in mind. I would include only the picture and not the code to generate it (i would then probably make a blog post with the code and diagram generation). (note the diagrams are for rugby dataset, not premier league)

image

image

image

source code: https://nbviewer.jupyter.org/github/OriolAbril/oriol_unraveled/blob/multi_lik/_notebooks/2020-06-21-multi-likelihood-diagrams.ipynb

@AlexAndorra
Copy link

Good idea, that looks nice! And indeed, releasing the code in a subsequent blog post seems appropriate -- showing how to display this figure isn't the core of this NB. We'll have to explain how to interpret the diagram though.

add diagrams
remove alternative PyMC3 implementation
add leave one team out draft description
@OriolAbril
Copy link
Member Author

I think I got the right skeleton now, only big structural change could be using rugby dataset instead to delete all backend specific info. Let me know what you think.

@@ -0,0 +1,3438 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo: straighforward is mispelled


Reply via ReviewNB

@aloctavodia
Copy link
Contributor

@OriolAbril I you are OK, I would like to merge this, and then work on adapting it to PyMC 5 and integrate it with the rest of the chapters.

@OriolAbril
Copy link
Member Author

OriolAbril commented Dec 1, 2023

sounds good! I also want to generate an alternative model with a group level variable as covariate (e.g. the anual budget of the team) so we can actually make model comparison.

The trickiest part might be having to run all 4 (or more) models every time we want to build the website though

@aloctavodia
Copy link
Contributor

I have not used it yet, but Quarto has a freeze feature https://quarto.org/docs/projects/code-execution.html#freeze
The other option (more DIY style) is to run the models once (or manually when needed) and write the code for the model in markdown cells instead of code cells, so they don't execute. We will figure it out. For the moment my main goal is to tidy up stuff.

@aloctavodia aloctavodia merged commit ff0f381 into arviz-devs:main Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants