Skip to content

Commit

Permalink
improve the hist reference guide (#1002)
Browse files Browse the repository at this point in the history
  • Loading branch information
MarcSkovMadsen authored Mar 16, 2023
1 parent cbab46c commit 229e93d
Show file tree
Hide file tree
Showing 2 changed files with 83 additions and 7 deletions.
73 changes: 67 additions & 6 deletions examples/reference/pandas/hist.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,16 @@
"metadata": {},
"outputs": [],
"source": [
"import hvplot.pandas # noqa"
"import hvplot.pandas # noqa\n",
"\n",
"# hvplot.extension(\"matplotlib\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`hist` is often a good way to start looking at data to get a sense of the distribution. Similar methods include [`kde`](kde.ipny) (also available as `density`)."
"`hist` is often a good way to start looking at continous data to get a sense of the distribution. Similar methods include [`kde`](kde.ipynb) (also available as `density`)."
]
},
{
Expand All @@ -22,9 +24,18 @@
"metadata": {},
"outputs": [],
"source": [
"from bokeh.sampledata.autompg import autompg_clean as df\n",
"from bokeh.sampledata.autompg import autompg_clean\n",
"\n",
"df.sample(n=5)"
"autompg_clean.sample(n=5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"autompg_clean.hvplot.hist(\"weight\")"
]
},
{
Expand All @@ -40,7 +51,57 @@
"metadata": {},
"outputs": [],
"source": [
"df.hvplot.hist(\"weight\", by=\"origin\", subplots=True, width=250)"
"autompg_clean.hvplot.hist(\"weight\", by=\"origin\", subplots=True, width=250)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also plot histograms of *datetime* data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from bokeh.sampledata.commits import data as commits\n",
"\n",
"commits = commits.reset_index().sort_values(\"datetime\")\n",
"commits.head(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"commits.hvplot.hist(\n",
" \"datetime\",\n",
" bin_range=(pd.Timestamp('2012-11-30'), pd.Timestamp('2017-05-01')),\n",
" bins=54, \n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to plot the distribution of a categorical column you can calculate the distribution using Pandas' method `value_counts` and plot it using `.hvplot.bar`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"autompg_clean[\"mfr\"].value_counts().hvplot.bar(invert=True, flip_yaxis=True, height=500)"
]
}
],
Expand All @@ -51,5 +112,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
17 changes: 16 additions & 1 deletion hvplot/plotting/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -1244,14 +1244,15 @@ def violin(self, y=None, by=None, **kwds):

def hist(self, y=None, by=None, **kwds):
"""
A `histogram` displays an approximate representation of the distribution of numerical data.
A `histogram` displays an approximate representation of the distribution of continous data.
Reference: https://hvplot.holoviz.org/reference/pandas/hist.html
Parameters
----------
y : string or sequence
Field(s) in the *wide* data to compute the distribution(s) from.
Please note the fields should contain continuous data. Not categorical.
by : string or sequence
Field(s) in the *long* data to group by.
bins : int, optional
Expand Down Expand Up @@ -1295,6 +1296,20 @@ def hist(self, y=None, by=None, **kwds):
df['two'] = df['one'] + np.random.randint(1, 7, 6000)
df.hvplot.hist(bins=12, alpha=0.5, color=["lightgreen", "pink"])
If you want to show the distribution of the values of a categorical column,
you can use Pandas' method `value_counts` and `bar` as shown below
.. code-block::
import hvplot.pandas
import pandas as pd
data = pd.DataFrame({
"library": ["bokeh", "plotly", "matplotlib", "bokeh", "matplotlib", "matplotlib"]
})
data["library"].value_counts().hvplot.bar()
References
----------
Expand Down

0 comments on commit 229e93d

Please sign in to comment.