GitHub - hmmkash/HASS-data-detective: Explore and analyse your Home Assistant data

Introduction

The HASS-data-detective package, which we may also refer to as 'detective' or 'data-detective', provides classes and functions to help you explore and analyse the data in your Home Assistant database. If you are using Hassio, it will automatically discover your sqlite database and by default collect information about the entities in your database. The recommended workflow is to then load the database content into a Pandas dataframe using the fetch_all_data method. This is recommended as all of the work of formatting the data for analysis is done up front, but it could take a couple of minutes. However if you have a very large database and cannot load it into a Pandas dataframe due to memory limits, you will have to adopt a different workflow where you query and process only the data you are interested in. Usage of detective.ipynb shows examples of using the detective with both of these workflows.

Note that not all python packages can be installed on Hassio yet - scipy is in this category. Notable packages which have scipy as a dependency include Seaborn.

Try out detective online

You can try out detective online without installing anything. If you click on the 'launch binder' button above, detective will be started in a Docker container online using the Binderhub service. Run the Usage of detective notebook to explore detective, and you can also use the Upload button to upload your own home-assistant_v2.db database file for analysis. Note that all data is deleted when the container closes down, so this service is just for trying out detective.

Installation on you machine

You can either: pip install HASS-data-detective for the latest released version from pypi, or pip install git+https://github.com/robmarkcole/HASS-data-detective.git --upgrade for the bleeding edge version from github. Alternatively if you wish to contribute to the development of detective, clone this repository and install in editable mode with pip install -e .

Initialise HassDatabase

Detective first needs to know the location of your database in order to initialise the HassDatabase object which handles communication with your database. If you are using the default sqlite database and have the home-assistant_v2.db file locally just supply the path:

from detective.core import HassDatabase

db = detective.HassDatabase('sqlite:////' + 'path_to/home-assistant_v2.db')

If you are running a database server for Home Assistant (e.g. mysql) you need to initialise the HassDatabase directly with the correct connection string, for example:

db = HassDatabase("mysql://scott:tiger@localhost/test")

Alternatively if you are using detective on Hassio, there are two possible ways to initialise the HassDatabase. The easiest is with db_from_hass_config. This will initialise a HassDatabase based on the the information found in your Home Assistant config folder, which it will automatically discover:

from detective.core import db_from_hass_config

db = db_from_hass_config() # Auto detect

Alternatively it's possible to pass the path in:

db = db_from_hass_config("/home/homeassistant/config") # Pass in path to config

Initialisation of HassDatabase accepts keyword arguments to influence how the object is initialised:

Argument	Description
`fetch_entities`	Boolean to indicate if we should fetch the entities when constructing the database. If not, you will have to call `db.fetch_entities()` at a later stage before being able to use `self.entities` and `self.domains`.

By default with fetch_entities=True, on initialisation HassDatabase will query the database and list the available domains and their entities in its domains and entities attributes:

db.domains

['persistent_notification',
 'remote',
 'script',
 'camera',
 'group',
 'light',
 'zone',
 'alarm_control_panel',
 'switch',
 'automation',
 'media_player',
 'device_tracker',
 'binary_sensor',
 'sensor',
 'input_select',
 'updater',
 'sun']

The attribute entities is a dictionary accessed via a domain name:

db.entities['binary_sensor']

['binary_sensor.motion_at_home',
 'binary_sensor.living_room_motion_sensor',
 'binary_sensor.in_bed_bayesian',
 'binary_sensor.hall_motion_sensor',
 'binary_sensor.bedroom_motion_sensor',
 'binary_sensor.blink_armed_status',
 'binary_sensor.blink_blink_camera_percy_motion_enabled',
 'binary_sensor.workday_sensor',
 'binary_sensor.living_room_nest_protect_online',
 'binary_sensor.bayesianbinary']

Simple query

Note that at this point we still haven't downloaded any actual data. Lets query a single sensor using SQL and demonstrate the data formatting steps performed by detective, in order to convery raw data into a format suitable for plotting and analysing:

query = text(
    """
    SELECT state, last_changed
    FROM states
    WHERE entity_id in ('sensor.hall_light_sensor')
    AND NOT state='unknown'
    """
    )
response = db.perform_query(query)

df = pd.DataFrame(response.fetchall()) # Convert the response to a dataframe

df.columns = ['state', 'last_changed'] # Set the columns

df = df.set_index('last_changed') # Set the index on datetime

df.index = pd.to_datetime(df.index) # Convert string to datetime

df = df.mask(df.eq('None')).dropna().astype(float) #  Convert state strings to floats for plotting

We can then plot the data:

df.plot(figsize=(16, 6));

Fetch all raw data

Use fetch_all_data to cache all your raw database data into a Pandas dataframe in memory. It is useful to keep this raw data in case you mess up your processed data and don't want to go through the process of fetching the raw data all over again.

%%time
db.fetch_all_data()

Querying the database, this could take a while
master_df created successfully.
CPU times: user 11.7 s, sys: 12.8 s, total: 24.4 s
Wall time: 1min 1s

We now have the raw data in a Pandas dataframe on the master_df attribute. We must use another class to process this data into a format suitable for plotting and processing. There are separate classes for numerical and binary sensors, which allows them to both implement a plot method correctly.

NumericalSensors class

The NumericalSensors class is for formatting numerical data. Create a dataframe with formatted numerical data like so:

sensors_num_df = detective.NumericalSensors(db.master_df)

We can access the list of sensor entities:

sensors_num_df.entities[0:10]

['sensor.next_train_to_wat',
 'sensor.next_bus_to_new_malden',
 'sensor.darksky_sensor_temperature',
 'sensor.darksky_sensor_precip_probability',
 'sensor.iphone_battery_level',
 'sensor.robins_iphone_battery_level',
 'sensor.blink_blink_camera_percy_temperature',
 'sensor.blink_blink_camera_percy_notifications',
 'sensor.next_train_in',
 'sensor.home_to_waterloo']

Now lets look at the Pandas dataframe which is on the data attribute:

sensors_num_df.data.head()

entity	sensor.average_indoor_temp	sensor.bedroom_light_sensor	sensor.bedroom_temperature	sensor.blink_blink_camera_percy_notifications	sensor.blink_blink_camera_percy_temperature	sensor.bme680air_qual	sensor.bme680humidity	sensor.bme680pressure	sensor.bme680temperature	sensor.breaches_fredallcardgmailcom	...	sensor.next_train_to_wat	sensor.next_train_to_wim	sensor.remote_living_room_button	sensor.robins_iphone_battery_level	sensor.speedtest_download	sensor.volume_used_volume_1	sensor.wipy_humidity	sensor.wipy_memory	sensor.wipy_temperature	sensor.work_to_home
last_changed
2017-10-28 06:48:00.143377	20.2	15621.0	18.89	1.0	21.0	98.51	43.58	1033.93	21.07	0.0	...	1125.0	87.0	1002.0	94.0	36.37	20.7	14.0	38112.0	32.0	25.0
2017-10-28 06:48:01.060922	20.2	15621.0	18.89	1.0	21.0	98.51	43.50	1033.93	21.07	0.0	...	1125.0	87.0	1002.0	94.0	36.37	20.7	14.0	38112.0	32.0	25.0
2017-10-28 06:48:01.069416	20.2	15621.0	18.89	1.0	21.0	98.51	43.50	1033.93	21.06	0.0	...	1125.0	87.0	1002.0	94.0	36.37	20.7	14.0	38112.0	32.0	25.0
2017-10-28 06:48:01.076784	20.2	15621.0	18.89	1.0	21.0	98.51	43.50	1033.95	21.06	0.0	...	1125.0	87.0	1002.0	94.0	36.37	20.7	14.0	38112.0	32.0	25.0
2017-10-28 06:48:01.079950	20.2	15621.0	18.89	1.0	21.0	98.54	43.50	1033.95	21.06	0.0	...	1125.0	87.0	1002.0	94.0	36.37	20.7	14.0	38112.0	32.0	25.0

5 rows × 52 columns

Lets check for correlations in the data:

corrs = sensors_num_df.correlations()

corrs[(corrs['value'] > 0.8) | (corrs['value'] < -0.8)]

	value
sensor.next_train_in-sensor.next_train_to_wim	0.999961
sensor.iphone_battery_level-sensor.robins_iphone_battery_level	0.923446
sensor.bme680air_qual-sensor.bme680pressure	0.862630
sensor.mean_temperature-sensor.bedroom_temperature	0.814340
sensor.living_room_temperature-sensor.bme680temperature	0.801827
sensor.bme680pressure-sensor.darksky_sensor_temperature	-0.810146
sensor.bme680humidity-sensor.bme680pressure	-0.862619
sensor.memory_usage_real-sensor.volume_used_volume_1	-0.902779
sensor.bme680humidity-sensor.bme680air_qual	-0.999989

Unsurprisingly the mean temperature is strongly correlated with all of the temperature sensors. Interestingly my iphone battery level is somewhat inversely correlated with the travel time from home to waterloo, which gets longer late at night when my battery level is more likely to be low.

Plot numerical sensor data

We can pass a single entity to plot:

sensors_num_df.plot('sensor.darksky_sensor_temperature')

We can pass a list of entities to plot:

to_plot = ['sensor.living_room_temperature',
           'sensor.bedroom_temperature',
           'sensor.darksky_sensor_temperature']

sensors_num_df.plot(to_plot)

BinarySensors class

The BinarySensors class is for binary sensor data with on/off states.

sensors_binary_df = detective.BinarySensors(db.master_df)

sensors_binary_df.entities

['binary_sensor.workday_sensor',
 'binary_sensor.blink_blink_camera_percy_motion_enabled',
 'binary_sensor.living_room_nest_protect_online',
 'binary_sensor.blink_armed_status',
 'binary_sensor.hall_motion_sensor',
 'binary_sensor.bedroom_motion_sensor',
 'binary_sensor.living_room_motion_sensor',
 'binary_sensor.motion_at_home',
 'binary_sensor.bayesianbinary',
 'binary_sensor.in_bed_bayesian']

We can plot a single binary sensor:

sensors_binary_df.plot('binary_sensor.motion_at_home')

OK now we have demonstrated the basic classes and functionality of detective, lets move on to some analysis!

Day of week analysis

Lets analyse the motion_at_home binary sensor data. We first create features from the raw data for the day-of-the-week and time categories, then perform analysis on these features.

from detective.time is_weekday, time_category

motion_df = sensors_binary_df.data[['binary_sensor.motion_at_home']] # Must pass a list to return correctly indexed df

motion_df['weekday'] = motion_df.index.weekday_name # get the weekday name

motion_df['is_weekday'] = motion_df.index.map(lambda x: is_weekday(x)) # determine if day is a weekday or not

motion_df = motion_df[motion_df['binary_sensor.motion_at_home'] == True] # Keep only true detection events

motion_df['time_category'] = motion_df.index.map(lambda x: time_category(x)) # Generate a time_category feature

motion_df.head()

entity	binary_sensor.motion_at_home	weekday	is_weekday	time_category
last_changed
2017-08-07 20:08:17.810800	True	Monday	True	evening
2017-08-07 20:08:26.921077	True	Monday	True	evening
2017-08-07 20:10:20.017217	True	Monday	True	evening
2017-08-07 20:11:31.024414	True	Monday	True	evening
2017-08-07 20:12:02.027471	True	Monday	True	evening

Lets now do a groupby operation:

motion_df['binary_sensor.motion_at_home'].groupby(motion_df['is_weekday']).describe()['count']

is_weekday
False     4452
True     10862
Name: count, dtype: object

motion_df_gb = motion_df['binary_sensor.motion_at_home'].groupby([motion_df['weekday'], motion_df['time_category']]).sum().unstack()

motion_df_gb.fillna(value=0, inplace=True) # Replace NaN with 0

motion_df_gb = motion_df_gb.astype('int') # Ints rather than floats

motion_df_gb = motion_df_gb.T

motion_df_gb

weekday	Friday	Monday	Saturday	Sunday	Thursday	Tuesday	Wednesday
time_category
daytime	1000	690	962	631	844	880	800
evening	394	599	239	496	453	532	545
morning	839	688	1047	833	664	655	619
night	92	93	131	113	163	149	163

We see that there is a lot of activity on saturday mornings, when I hoover the house. We can also visualise this data using Seaborn.

Seaborn

Seaborn is a python package for doing statistical plots. Unfortunately it is not yet supported on Hassio, but if you are on a Mac or PC you can use it like follows:

import seaborn as sns

fig, ax = plt.subplots(figsize=(14, 6))
days_list = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
times_list = ['morning', 'daytime', 'evening', 'night']
ax = sns.heatmap(motion_df_gb[days_list].loc[times_list], annot=True, linewidths=.5, fmt="d", ax=ax, cmap='Reds');
ax.set_title('Activity at home by day and time category')
#fig.savefig('heatmap.jpg')

Auth helpers

When querying the database, you might end up with user IDs and refresh token IDs. We've included a helper to help load the auth from Home Assistant and help you process this data.

from detective.auth import auth_from_hass_config

auth = auth_from_hass_config()

auth.users

{
  "user-id": {
    "id": "id of user",
    "name": "Name of user",
  }
}

auth.refresh_tokens

"refresh-token-id": {
  "id": "id of token",
  "user": "user object related to token",
  "client_name": "Name of client that created token",
  "client_id": "ID of client that created token",
}

> auth.user_name('some-user-id')
Paulus

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github		.github
detective		detective
docs/images		docs/images
notebooks		notebooks
tests		tests
util		util
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
requirements_test.txt		requirements_test.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Try out detective online

Installation on you machine

Initialise HassDatabase

Simple query

Fetch all raw data

NumericalSensors class

Plot numerical sensor data

BinarySensors class

Day of week analysis

Seaborn

Auth helpers

About

Releases

Packages

Languages

License

hmmkash/HASS-data-detective

Folders and files

Latest commit

History

Repository files navigation

Introduction

Try out detective online

Installation on you machine

Initialise HassDatabase

Simple query

Fetch all raw data

NumericalSensors class

Plot numerical sensor data

BinarySensors class

Day of week analysis

Seaborn

Auth helpers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages