Lineage-associated mutations #180

sydelstan · 2021-11-09T23:40:30Z

I ran the lmm with the lineage option. It returned significant hits associated with certain lineages. I looked at the strains expressing the significant hits and found that they are expressed in lineage 1, but the output noted they are associated with lineage 2. How is this possible?

johnlees · 2021-11-10T09:33:02Z

I looked at the strains expressing the significant hits and found that they are expressed in lineage 1

I don't understand what you mean by this statement, could you please expand?

sydelstan · 2021-11-10T15:54:47Z

The output returns sites with variants that are significantly associated with a phenotype, the samples in which the variants are found, and the lineage the variant is most associated with. In the output, there are instances where variants are expressed in strains from lineage x, but the variants are associated with lineage y. How can a variant be associated with lineage y if all of the variants are expressed in strains from lineage x?

johnlees · 2021-11-10T16:10:05Z

I think it might help if you were able to provide some examples of output(s) you're referring to? Can I also confirm that by 'expressed' you mean 'present'?
It's possible this could an issue with the sign of the effect not being considered, but it's a while since I've worked with this bit of the code

sydelstan · 2021-11-10T16:30:32Z

Yes by ‘expressed’ I mean ‘present’

I attached a screenshot. The variants in column B are presents in the samples in column J. Column I says that the variants in Column B are associated with Lineage 1, but I know that the samples in column J are not Lineage 1, they belong to another lineage.

johnlees · 2021-11-15T12:53:40Z

Here is the code used to determine the lineage:
https://github.com/mgalardini/pyseer/blob/master/pyseer/model.py#L145-L188

So we (logistic) regress k-mer presence against lineage markers, and take the strongest association.
There are two things that could be happening here:

Lineage two may be excluded from the regression. We need to remove one lineage otherwise the regression is not of full rank and won't converge. This will be the lineage with the lowest p-value. See code here. In associations with more lineages this usually isn't a problem, but here if you only have a few it may cause an unexpected relation.
With a small number of lineages, it is also plausible that the largest p-value may be to another lineage, but negatively associated, as we don't actually check this sign. i.e lineage 1 is much more common, and there is a strong negative association with it; the positive association with lineage 2 is weaker as it is rarer.

We should probably change the behaviour of 2) in either case, but to rule out 1) you could check what rank the expected lineage is in the lineage effects file.

(note, I think 2) may need to be thought about a bit more using MDS components, would a negative association still be ok?)

sydelstan · 2021-11-15T13:39:30Z

I've included three lineages, and I used the --lineage-clusters option. I'm still not sure why the variants would be associated with a lineage where no strains express that variant

johnlees · 2021-11-15T14:16:45Z

Could you share the contents of lineage_effects.txt?

sydelstan · 2021-11-16T03:24:36Z

BDQ_CA_lineage.txt

johnlees · 2021-11-16T09:15:50Z

Ok based on that, I think it's option 2). What you could do is replace lines 176-181 in model.py with:

    try:
        lineage_res = lineage_mod.fit(method='newton', disp=False)

        wald_test = np.divide(lineage_res.params, lineage_res.bse)
        # excluding intercept and covariates
        max_lineage = np.argmax(wald_test[1:lin.shape[1]+1])

i.e. remove np.absolute

and then run again. Note that you can run from source if you clone the repo, and run python pyseer-runner.py

sydelstan · 2021-11-17T00:31:40Z

Great, I'll try that

johnlees added the bug label Nov 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lineage-associated mutations #180

Lineage-associated mutations #180

sydelstan commented Nov 9, 2021

johnlees commented Nov 10, 2021

sydelstan commented Nov 10, 2021

johnlees commented Nov 10, 2021

sydelstan commented Nov 10, 2021

johnlees commented Nov 15, 2021

sydelstan commented Nov 15, 2021

johnlees commented Nov 15, 2021

sydelstan commented Nov 16, 2021

johnlees commented Nov 16, 2021

sydelstan commented Nov 17, 2021

Lineage-associated mutations #180

Lineage-associated mutations #180

Comments

sydelstan commented Nov 9, 2021

johnlees commented Nov 10, 2021

sydelstan commented Nov 10, 2021

johnlees commented Nov 10, 2021

sydelstan commented Nov 10, 2021

johnlees commented Nov 15, 2021

sydelstan commented Nov 15, 2021

johnlees commented Nov 15, 2021

sydelstan commented Nov 16, 2021

johnlees commented Nov 16, 2021

sydelstan commented Nov 17, 2021