Skip to content

Commit

Permalink
Modernize categorical plotting and refactor stripplot (#2413)
Browse files Browse the repository at this point in the history
* Proof of principle refactored stripplot passing all tests

* Improve handling of categorical dates

* Improve automatic categorical orientation with dates

* Add more continuous datetime variable to long_df fixture

* Begin updating stripplot tests

* Update more stripplot tests

* Add test for single strip, with hue

* Fix infer_orient argcheck

* Add tests for flat and wide data in stripplot

* Refactor hue backcompat into a plotter class method, make optional

* Enable new default coloring rules in stripplot

* Update catplot to use new stripplot function

* Update assert_plots_equal to test all collections

* Clean up some comments

* Remove old stripplot code

* Fix typo

* Add explicit categorical order to VectorPlotter._attach

* Modify the implementation of categorical data handling to permit unshared facets

* Improve integration of axis converters with unshared facet grids

* Fix ordering by category dtype

* Fix catplot point sizes

* Add (un)fixed_scale

* Fix plot equality assertion

* Disable tests that hit matplotlib bug due to incomplete implemenation

* Improve test coverage

* Move forced/ordered categorical scaling logic to core

* Add core-level tets for scale method(s)

* Reduce use of special attributes, add formatter and hue_norm

* Update stripplot API examples

* Re-enable kwarg deprecation warning

* Fix log scaled stripplot

* Fixed log-scaled categorical axis

* Don't jitter single strips
  • Loading branch information
mwaskom authored Jan 19, 2021
1 parent aa488f0 commit a809747
Show file tree
Hide file tree
Showing 13 changed files with 1,700 additions and 420 deletions.
4 changes: 2 additions & 2 deletions doc/docstrings/histplot.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -461,9 +461,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "seaborn-refactor (py38)",
"display_name": "seaborn-py38-latest",
"language": "python",
"name": "seaborn-refactor"
"name": "seaborn-py38-latest"
},
"language_info": {
"codemirror_mode": {
Expand Down
313 changes: 313 additions & 0 deletions doc/docstrings/stripplot.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,313 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"hide"
]
},
"outputs": [],
"source": [
"import seaborn as sns\n",
"sns.set_theme(style=\"whitegrid\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Assigning a single numeric variable shows its univariate distribution with points randomly \"jittered\" on the other axis:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tips = sns.load_dataset(\"tips\")\n",
"sns.stripplot(data=tips, x=\"total_bill\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Assigning a second variable splits the strips of poins to compare categorical levels of that variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(data=tips, x=\"total_bill\", y=\"day\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Show vertically-oriented strips by swapping the assignment of the categorical and numerical variables:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(data=tips, x=\"day\", y=\"total_bill\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Prior to version 0.12, the levels of the categorical variable had different colors. To get the same effect, assign the `hue` variable explicitly:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(data=tips, x=\"total_bill\", y=\"day\", hue=\"day\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Or you can assign a distinct variable to `hue` to show a multidimensional relationship:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(data=tips, x=\"total_bill\", y=\"day\", hue=\"sex\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"If the `hue` variable is numeric, it will be mapped with a quantitative palette by default (this was not the case prior to version 0.12):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(data=tips, x=\"total_bill\", y=\"day\", hue=\"size\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Use `palette` to control the color mapping, including forcing a categorical mapping by passing the name of a qualitative palette:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(data=tips, x=\"total_bill\", y=\"day\", hue=\"size\", palette=\"deep\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"By default, the different levels of the `hue` variable are intermingled in each strip, but setting `dodge=True` will split them:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(data=tips, x=\"total_bill\", y=\"day\", hue=\"sex\", dodge=True)"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"The random jitter can be disabled by setting `jitter=False`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(data=tips, x=\"total_bill\", y=\"day\", hue=\"sex\", dodge=True, jitter=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If plotting in wide-form mode, each column of the dataframe will be mapped to both `x` and `hue`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(data=tips)"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"To change the orientation while in wide-form mode, pass `orient` explicitly:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(data=tips, orient=\"h\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"The `orient` parameter is also useful when both axis variables are numeric, as it will resolve ambiguity about which dimension to group (and jitter) along:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(data=tips, x=\"total_bill\", y=\"size\", orient=\"h\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"By default, the categorical variable will be mapped to discrete indices with a fixed scale (0, 1, ...), even when it is numeric:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(\n",
" data=tips.query(\"size in [2, 3, 5]\"),\n",
" x=\"total_bill\", y=\"size\", orient=\"h\",\n",
")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"To disable this behavior and use the original scale of the variable, set `fixed_scale=False`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(\n",
" data=tips.query(\"size in [2, 3, 5]\"),\n",
" x=\"total_bill\", y=\"size\", orient=\"h\",\n",
" fixed_scale=False,\n",
")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Further visual customization can be achieved by passing matplotlib keyword arguments:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.stripplot(\n",
" data=tips, x=\"total_bill\", y=\"day\", hue=\"time\",\n",
" jitter=False, s=20, marker=\"D\", linewidth=1, alpha=.1,\n",
")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"To make a plot with multiple facets, it is safer to use :func:`catplot` than to work with :class:`FacetGrid` directly, because :func:`catplot` will ensure that the categorical and hue variables are properly synchronized in each facet:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.catplot(data=tips, x=\"time\", y=\"total_bill\", hue=\"sex\", col=\"day\", aspect=.5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "seaborn-py38-latest",
"language": "python",
"name": "seaborn-py38-latest"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
4 changes: 4 additions & 0 deletions doc/releases/v0.12.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ v0.12.0 (Unreleased)

- |Fix| |Enhancement| Improved robustness to missing data, including additional support for the `pd.NA` type (:pr:`2417).

- TODO function specific categorical enhancements, including:

- In :func:`stripplot`, a "strip" with a single observation will be plotted without jitter (:pr:`2413`)

- Made `scipy` an optional dependency and added `pip install seaborn[all]` as a method for ensuring the availability of compatible `scipy` and `statsmodels` libraries at install time. This has a few minor implications for existing code, which are explained in the Github pull request (:pr:`2398`).

- Following `NEP29 <https://numpy.org/neps/nep-0029-deprecation_policy.html>`_, dropped support for Python 3.6 and bumped the minimally-supported versions of the library dependencies.
Expand Down
Loading

0 comments on commit a809747

Please sign in to comment.