The HASS-data-detective
package, which we may also refer to as 'detective' or 'data-detective', provides classes and functions to help you explore and analyse the data in your Home Assistant database. If you are using Hassio, it will automatically discover your sqlite database and by default collect information about the entities in your database. The recommended workflow is to then load the database content into a Pandas dataframe using the fetch_all_data
method. This is recommended as all of the work of formatting the data for analysis is done up front, but it could take a couple of minutes. However if you have a very large database and cannot load it into a Pandas dataframe due to memory limits, you will have to adopt a different workflow where you query and process only the data you are interested in. Usage of detective.ipynb shows examples of using the detective with both of these workflows.
Note that not all python packages can be installed on Hassio yet - scipy is in this category. Notable packages which have scipy as a dependency include Seaborn.
You can try out detective online without installing anything. If you click on the 'launch binder' button above, detective will be started in a Docker container online using the Binderhub service. Run the Usage of detective
notebook to explore detective, and you can also use the Upload
button to upload your own home-assistant_v2.db
database file for analysis. Note that all data is deleted when the container closes down, so this service is just for trying out detective.
You can either: pip install HASS-data-detective
for the latest released version from pypi, or pip install git+https://github.com/robmarkcole/HASS-data-detective.git --upgrade
for the bleeding edge version from github. Alternatively if you wish to contribute to the development of detective, clone this repository and install in editable mode with pip install -e .
Detective first needs to know the location of your database in order to initialise the HassDatabase
object which handles communication with your database. If you are using the default sqlite database and have the home-assistant_v2.db
file locally just supply the path:
from detective.core import HassDatabase
db = detective.HassDatabase('sqlite:////' + 'path_to/home-assistant_v2.db')
If you are running a database server for Home Assistant (e.g. mysql) you need to initialise the HassDatabase
directly with the correct connection string, for example:
db = HassDatabase("mysql://scott:tiger@localhost/test")
Alternatively if you are using detective on Hassio, there are two possible ways to initialise the HassDatabase
. The easiest is with db_from_hass_config
. This will initialise a HassDatabase
based on the the information found in your Home Assistant config folder, which it will automatically discover:
from detective.core import db_from_hass_config
db = db_from_hass_config() # Auto detect
Alternatively it's possible to pass the path in:
db = db_from_hass_config("/home/homeassistant/config") # Pass in path to config
Initialisation of HassDatabase
accepts keyword arguments to influence how the object is initialised:
Argument | Description |
---|---|
fetch_entities |
Boolean to indicate if we should fetch the entities when constructing the database. If not, you will have to call db.fetch_entities() at a later stage before being able to use self.entities and self.domains . |
By default with fetch_entities=True
, on initialisation HassDatabase
will query the database and list the available domains and their entities in its domains
and entities
attributes:
db.domains
['persistent_notification',
'remote',
'script',
'camera',
'group',
'light',
'zone',
'alarm_control_panel',
'switch',
'automation',
'media_player',
'device_tracker',
'binary_sensor',
'sensor',
'input_select',
'updater',
'sun']
The attribute entities
is a dictionary accessed via a domain name:
db.entities['binary_sensor']
['binary_sensor.motion_at_home',
'binary_sensor.living_room_motion_sensor',
'binary_sensor.in_bed_bayesian',
'binary_sensor.hall_motion_sensor',
'binary_sensor.bedroom_motion_sensor',
'binary_sensor.blink_armed_status',
'binary_sensor.blink_blink_camera_percy_motion_enabled',
'binary_sensor.workday_sensor',
'binary_sensor.living_room_nest_protect_online',
'binary_sensor.bayesianbinary']
Note that at this point we still haven't downloaded any actual data. Lets query a single sensor using SQL and demonstrate the data formatting steps performed by detective, in order to convery raw data into a format suitable for plotting and analysing:
query = text(
"""
SELECT state, last_changed
FROM states
WHERE entity_id in ('sensor.hall_light_sensor')
AND NOT state='unknown'
"""
)
response = db.perform_query(query)
df = pd.DataFrame(response.fetchall()) # Convert the response to a dataframe
df.columns = ['state', 'last_changed'] # Set the columns
df = df.set_index('last_changed') # Set the index on datetime
df.index = pd.to_datetime(df.index) # Convert string to datetime
df = df.mask(df.eq('None')).dropna().astype(float) # Convert state strings to floats for plotting
We can then plot the data:
df.plot(figsize=(16, 6));
Use fetch_all_data
to cache all your raw database data into a Pandas dataframe in memory. It is useful to keep this raw data in case you mess up your processed data and don't want to go through the process of fetching the raw data all over again.
%%time
db.fetch_all_data()
Querying the database, this could take a while
master_df created successfully.
CPU times: user 11.7 s, sys: 12.8 s, total: 24.4 s
Wall time: 1min 1s
We now have the raw data in a Pandas dataframe on the master_df
attribute. We must use another class to process this data into a format suitable for plotting and processing. There are separate classes for numerical and binary sensors, which allows them to both implement a plot
method correctly.
The NumericalSensors
class is for formatting numerical data. Create a dataframe with formatted numerical data like so:
sensors_num_df = detective.NumericalSensors(db.master_df)
We can access the list of sensor entities:
sensors_num_df.entities[0:10]
['sensor.next_train_to_wat',
'sensor.next_bus_to_new_malden',
'sensor.darksky_sensor_temperature',
'sensor.darksky_sensor_precip_probability',
'sensor.iphone_battery_level',
'sensor.robins_iphone_battery_level',
'sensor.blink_blink_camera_percy_temperature',
'sensor.blink_blink_camera_percy_notifications',
'sensor.next_train_in',
'sensor.home_to_waterloo']
Now lets look at the Pandas dataframe which is on the data
attribute:
sensors_num_df.data.head()
entity | sensor.average_indoor_temp | sensor.bedroom_light_sensor | sensor.bedroom_temperature | sensor.blink_blink_camera_percy_notifications | sensor.blink_blink_camera_percy_temperature | sensor.bme680air_qual | sensor.bme680humidity | sensor.bme680pressure | sensor.bme680temperature | sensor.breaches_fredallcardgmailcom | ... | sensor.next_train_to_wat | sensor.next_train_to_wim | sensor.remote_living_room_button | sensor.robins_iphone_battery_level | sensor.speedtest_download | sensor.volume_used_volume_1 | sensor.wipy_humidity | sensor.wipy_memory | sensor.wipy_temperature | sensor.work_to_home |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
last_changed | |||||||||||||||||||||
2017-10-28 06:48:00.143377 | 20.2 | 15621.0 | 18.89 | 1.0 | 21.0 | 98.51 | 43.58 | 1033.93 | 21.07 | 0.0 | ... | 1125.0 | 87.0 | 1002.0 | 94.0 | 36.37 | 20.7 | 14.0 | 38112.0 | 32.0 | 25.0 |
2017-10-28 06:48:01.060922 | 20.2 | 15621.0 | 18.89 | 1.0 | 21.0 | 98.51 | 43.50 | 1033.93 | 21.07 | 0.0 | ... | 1125.0 | 87.0 | 1002.0 | 94.0 | 36.37 | 20.7 | 14.0 | 38112.0 | 32.0 | 25.0 |
2017-10-28 06:48:01.069416 | 20.2 | 15621.0 | 18.89 | 1.0 | 21.0 | 98.51 | 43.50 | 1033.93 | 21.06 | 0.0 | ... | 1125.0 | 87.0 | 1002.0 | 94.0 | 36.37 | 20.7 | 14.0 | 38112.0 | 32.0 | 25.0 |
2017-10-28 06:48:01.076784 | 20.2 | 15621.0 | 18.89 | 1.0 | 21.0 | 98.51 | 43.50 | 1033.95 | 21.06 | 0.0 | ... | 1125.0 | 87.0 | 1002.0 | 94.0 | 36.37 | 20.7 | 14.0 | 38112.0 | 32.0 | 25.0 |
2017-10-28 06:48:01.079950 | 20.2 | 15621.0 | 18.89 | 1.0 | 21.0 | 98.54 | 43.50 | 1033.95 | 21.06 | 0.0 | ... | 1125.0 | 87.0 | 1002.0 | 94.0 | 36.37 | 20.7 | 14.0 | 38112.0 | 32.0 | 25.0 |
5 rows × 52 columns
Lets check for correlations in the data:
corrs = sensors_num_df.correlations()
corrs[(corrs['value'] > 0.8) | (corrs['value'] < -0.8)]
value | |
---|---|
sensor.next_train_in-sensor.next_train_to_wim | 0.999961 |
sensor.iphone_battery_level-sensor.robins_iphone_battery_level | 0.923446 |
sensor.bme680air_qual-sensor.bme680pressure | 0.862630 |
sensor.mean_temperature-sensor.bedroom_temperature | 0.814340 |
sensor.living_room_temperature-sensor.bme680temperature | 0.801827 |
sensor.bme680pressure-sensor.darksky_sensor_temperature | -0.810146 |
sensor.bme680humidity-sensor.bme680pressure | -0.862619 |
sensor.memory_usage_real-sensor.volume_used_volume_1 | -0.902779 |
sensor.bme680humidity-sensor.bme680air_qual | -0.999989 |
Unsurprisingly the mean temperature is strongly correlated with all of the temperature sensors. Interestingly my iphone battery level is somewhat inversely correlated with the travel time from home to waterloo, which gets longer late at night when my battery level is more likely to be low.
We can pass a single entity to plot:
sensors_num_df.plot('sensor.darksky_sensor_temperature')
We can pass a list of entities to plot:
to_plot = ['sensor.living_room_temperature',
'sensor.bedroom_temperature',
'sensor.darksky_sensor_temperature']
sensors_num_df.plot(to_plot)
The BinarySensors
class is for binary sensor data with on/off states.
sensors_binary_df = detective.BinarySensors(db.master_df)
sensors_binary_df.entities
['binary_sensor.workday_sensor',
'binary_sensor.blink_blink_camera_percy_motion_enabled',
'binary_sensor.living_room_nest_protect_online',
'binary_sensor.blink_armed_status',
'binary_sensor.hall_motion_sensor',
'binary_sensor.bedroom_motion_sensor',
'binary_sensor.living_room_motion_sensor',
'binary_sensor.motion_at_home',
'binary_sensor.bayesianbinary',
'binary_sensor.in_bed_bayesian']
We can plot a single binary sensor:
sensors_binary_df.plot('binary_sensor.motion_at_home')
OK now we have demonstrated the basic classes and functionality of detective, lets move on to some analysis!
Lets analyse the motion_at_home binary sensor data. We first create features from the raw data for the day-of-the-week and time categories, then perform analysis on these features.
from detective.time is_weekday, time_category
motion_df = sensors_binary_df.data[['binary_sensor.motion_at_home']] # Must pass a list to return correctly indexed df
motion_df['weekday'] = motion_df.index.weekday_name # get the weekday name
motion_df['is_weekday'] = motion_df.index.map(lambda x: is_weekday(x)) # determine if day is a weekday or not
motion_df = motion_df[motion_df['binary_sensor.motion_at_home'] == True] # Keep only true detection events
motion_df['time_category'] = motion_df.index.map(lambda x: time_category(x)) # Generate a time_category feature
motion_df.head()
entity | binary_sensor.motion_at_home | weekday | is_weekday | time_category |
---|---|---|---|---|
last_changed | ||||
2017-08-07 20:08:17.810800 | True | Monday | True | evening |
2017-08-07 20:08:26.921077 | True | Monday | True | evening |
2017-08-07 20:10:20.017217 | True | Monday | True | evening |
2017-08-07 20:11:31.024414 | True | Monday | True | evening |
2017-08-07 20:12:02.027471 | True | Monday | True | evening |
Lets now do a groupby operation:
motion_df['binary_sensor.motion_at_home'].groupby(motion_df['is_weekday']).describe()['count']
is_weekday
False 4452
True 10862
Name: count, dtype: object
motion_df_gb = motion_df['binary_sensor.motion_at_home'].groupby([motion_df['weekday'], motion_df['time_category']]).sum().unstack()
motion_df_gb.fillna(value=0, inplace=True) # Replace NaN with 0
motion_df_gb = motion_df_gb.astype('int') # Ints rather than floats
motion_df_gb = motion_df_gb.T
motion_df_gb
weekday | Friday | Monday | Saturday | Sunday | Thursday | Tuesday | Wednesday |
---|---|---|---|---|---|---|---|
time_category | |||||||
daytime | 1000 | 690 | 962 | 631 | 844 | 880 | 800 |
evening | 394 | 599 | 239 | 496 | 453 | 532 | 545 |
morning | 839 | 688 | 1047 | 833 | 664 | 655 | 619 |
night | 92 | 93 | 131 | 113 | 163 | 149 | 163 |
We see that there is a lot of activity on saturday mornings, when I hoover the house. We can also visualise this data using Seaborn.
Seaborn is a python package for doing statistical plots. Unfortunately it is not yet supported on Hassio, but if you are on a Mac or PC you can use it like follows:
import seaborn as sns
fig, ax = plt.subplots(figsize=(14, 6))
days_list = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
times_list = ['morning', 'daytime', 'evening', 'night']
ax = sns.heatmap(motion_df_gb[days_list].loc[times_list], annot=True, linewidths=.5, fmt="d", ax=ax, cmap='Reds');
ax.set_title('Activity at home by day and time category')
#fig.savefig('heatmap.jpg')
When querying the database, you might end up with user IDs and refresh token IDs. We've included a helper to help load the auth from Home Assistant and help you process this data.
from detective.auth import auth_from_hass_config
auth = auth_from_hass_config()
auth.users
{
"user-id": {
"id": "id of user",
"name": "Name of user",
}
}
auth.refresh_tokens
"refresh-token-id": {
"id": "id of token",
"user": "user object related to token",
"client_name": "Name of client that created token",
"client_id": "ID of client that created token",
}
> auth.user_name('some-user-id')
Paulus