Pykelihood is a Python package for statistical analysis designed to give more flexibility to likelihood-based inference than is possible with scipy.stats. Distributions are designed from an Object Oriented Programming (OOP) point of view.
Main features include:
- use any scipy.stats distribution, or make your own,
- fit distributions of arbitrary complexity to your data,
- add trends of different forms in the parameters of any distribution,
- condition the log-likelihood with any form of penalty,
- profile parameters with a penalised log-likelihood,
- more to come...
pip install pykelihood
git clone https://www.github.com/OpheliaMiralles/pykelihood
or
gh repo clone OpheliaMiralles/pykelihood
The most basic thing you can use pykelihood
for is creating and manipulating distributions as objects.
>>> from pykelihood.distributions import Normal
>>> n = Normal(1, 2)
>>> n
Normal(loc=1.0, scale=2.0)
n
is an object of type Normal
. It has 2 parameters, loc
and scale
. They can be accessed like standard
Python attributes:
>>> n.loc
1.0
Using the Normal
object, you can calculate standard values using the same semantics as scipy.stats:
>>> n.pdf([0, 1, 2])
array([0.17603266, 0.19947114, 0.17603266])
>>> n.cdf([0, 1, 2])
array([0.30853754, 0.5 , 0.69146246])
Or you can also generate random values according to this distribution:
>>> n.rvs(10)
array([ 3.31370986, 5.02699468, -0.3573229 , 1.00460378, -3.26044871,
1.86362711, -0.84192901, 0.81132182, -2.03266978, 1.48079944])
Let's generate a larger sample from our previous object:
>>> data = n.rvs(1000)
>>> data.mean()
1.025039359276458
>>> data.std()
1.9376460645596842
We can fit a Normal
distribution to this data, which will return another Normal
object:
>>> Normal.fit(data)
Normal(loc=1.0250822420920338, scale=1.9376400770300832)
As you can see, the values are slightly different from the moments in the data.
This is due to the fact that the fit
method returns the Maximum Likelihood Estimator (MLE)
for the data, and is thus the result of an optimisation (using scipy.optimize).
We can also fix the value for some parameters if we know them:
>>> Normal.fit(data, loc=1)
Normal(loc=1.0, scale=1.9377929687500024)
One of the most powerful features of pykelihood
is the ability to fit arbitrary distributions.
For instance, say our data has a linear trend with a very little gaussian noise we would like to capture:
>>> import numpy as np
>>> data = np.linspace(-1, 1, 365) + np.random.normal(0, 0.001, 365)
>>> data[:10]
array([-0.99802364, -0.99503679, -0.98900434, -0.98277981, -0.979487 ,
-0.97393519, -0.96853445, -0.96149152, -0.95564004, -0.95054887])
If we try to fit this without a trend, the resulting distribution will miss out on most of the information:
>>> Normal.fit(data)
Normal(loc=-3.6462053656578005e-05, scale=0.5789668679237372)
Let's fit a Normal
distribution with a trend in the loc parameter:
>>> from pykelihood import kernels
>>> Normal.fit(data, loc=kernels.linear(np.arange(365)))
Normal(loc=linear(a=-1.0000458359290572, b=0.005494714384381866), scale=0.0010055323717468906)
kernels.linear(X)
builds a linear model in the form a + bX where a and b are parameters to
be optimised for, and X is some covariate used to fit the data. If we assume the data were daily observations,
then we find all the values we expected: -1 was the value on the first day, 0.05 was the daily increment
(2 / 365 = 0.05), and there was a noise with std deviation 0.001.
Another useful feature of pykelihood
is the ability to customize the log-likelihood function with penalties, conditioning methods, stability conditions, etc. Most statistics-related packages offer to fit data using the standard opposite log-likelihood function, or in the best case, preselected models. To our knowledge, pykelihood
is the only Python package allowing to easily customize the log-likelihood function to fit data.
>>> data = np.random.normal(0, 1, 1000)
>>> def lassolike_score(distribution, data):
... return -np.sum(distribution.logpdf(data)) + 5 * np.abs(distribution.loc())
...
>>> std_fit = Normal.fit(data)
>>> cond_fit = Normal.fit(data, score=lassolike_score)
>>> std_fit.loc.value
-0.010891307380632494
>>> cond_fit.loc.value
-0.006210406541824357
Likelihood based inference relies on parameter estimation. This is why it's important to quantify the sensitivity of a
chosen model to each of those parameters. The stats_utils
module in pykelihood
includes the Profiler
class that allows to link a model to a set of observations by providing goodness of fit metrics and profiles for all
parameters.
>>> from pykelihood.profiler import Profiler
>>> from pykelihood.distributions import GEV
>>> fitted_gev = GEV.fit(data, loc=kernels.linear(np.linspace(-1, 1, len(data))))
>>> ll = Profiler(fitted_gev, data, inference_confidence=0.99) # level of confidence for tests
>>> ll.AIC # the standard fit is without trend
{'AIC MLE': -359.73533182968777, 'AIC Standard MLE Fit': 623.9896838880583}
>>> ll.profiles.keys()
dict_keys(['loc_a', 'loc_b', 'scale', 'shape'])
>>> ll.profiles["shape"].head(5)
loc_a loc_b scale shape likelihood
0 -0.000122 1.000812 0.002495 -0.866884 1815.022132
1 -0.000196 1.000795 0.001964 -0.662803 1882.043541
2 -0.000283 1.000477 0.001469 -0.458721 1954.283256
3 -0.000439 1.000012 0.000987 -0.254640 2009.740282
4 -0.000555 1.000016 0.000948 -0.050558 1992.812843
Confidence intervals can be computed for specified metrics:
>>> def metric(gev): return gev.loc()
>>> ll.confidence_interval(metric)
[-4.160287666875364, 4.7039931595123825]
Poetry is used to manage pykelihood
's dependencies and build system. To install
Poetry, you can refer to the installation instructions, but it boils
down to running:
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
To configure your environment to work on pykelihood, run:
git clone https://www.github.com/OpheliaMiralles/pykelihood # or any other clone method cd pykelihood poetry install
This will create a virtual environment for the project and install the required dependencies. To activate the virtual
environment, be sure to run poetry shell
prior to executing any code.
We also use the pre-commit library which adds git hooks to the repository. These must be installed with:
pre-commit install
Some parts of the code base use the matplotlib and hawkeslib package, but are for now not required to run most of the code, including the tests.
Tests are run using pytest. To run all tests, navigate to the root folder or the
tests
folder and type pytest
.