Abstract
This is a major release from 0.5. The main objective of this release was to unify the API for categorical plots, which means that there are some relatively large API changes in some of the older functions. See below for details of those changes, which may break code written for older versions of seaborn. There are also some new functions (stripplot, and countplot), numerous enhancements to existing functions, and bug fixes. Additionally, the documentation has been completely revamped and expanded for the 0.6 release. Now, the API docs page for each function has multiple examples with embedded plots showing how to use the various options. These pages should be considered the most comprehensive resource for examples, and the tutorial pages are now streamlined and oriented towards a higher-level overview of the various features. Changes and updates to categorical plots In version 0.6, the "categorical" plots have been unified with a common API. This new category of functions groups together plots that show the relationship between one numeric variable and one or two categorical variables. This includes plots that show distribution of the numeric variable in each bin (boxplot, violinplot, and stripplot) and plots that apply a statistical estimation within each bin (pointplot, barplot, and countplot). There is a new tutorial chapter that introduces these functions. The categorical functions now each accept the same formats of input data and can be invoked in the same way. They can plot using long- or wide-form data, and can be drawn vertically or horizontally. When long-form data is used, the orientation of the plots is inferred from the types of the input data. Additionally, all functions natively take a hue variable to add a second layer of categorization. With the (in some cases new) API, these functions can all be drawn correctly by FacetGrid. However, factorplot can also now create faceted verisons of any of these kinds of plots, so in most cases it will be unnecessary to use FacetGrid directly. By default, factorplot draws a point plot, but this is controlled by the kind parameter. Here are details on what has changed in the process of unifying these APIs: Changes to boxplot and violinplot will probably be the most disruptive. Both functions maintain backwards-compatibility in terms of the kind of data they can accept, but the syntax has changed to be more similar to other seaborn functions. These functions are now invoked with x and/or y parameters that are either vectors of data or names of variables in a long-form DataFrame passed to the new data parameter. You can still pass wide-form DataFrames or arrays to data, but it is no longer the first positional argument. See the github pull request for more information on these changes and the logic behind them. As pointplot and barplot can now plot with the major categorical variable on the y axis, the x_order parameter has been renamed to order. Added a hue argument to boxplot and violinplot, which allows for nested grouping the plot elements by a third categorical variable. For violinplot, this nesting can also be accomplished by splitting the violins when there are two levels of the hue variable (using split=True). To make this functionality feasible, the ability to specify where the plots will be draw in data coordinates has been removed. These plots now are drawn at set positions, like (and identical to) barplot and pointplot. Added a palette parameter to boxplot/violinplot. The color parameter still exists, but no longer does double-duty in accepting the name of a seaborn palette. palette supersedes color so that it can be used with a FacetGrid. Along with these API changes, the following changes/enhancements were made to the plotting functions: The default rules for ordering the categories has changed. Instead of automatically sorting the category levels, the plots now show the levels in the order they appear in the input data (i.e., the order given by Series.unique()). Order can be specified when plotting with the order and hue_order parameters. Additionally, when variables are pandas objects with a "categorical" dtype, the category order is inferred from the data object. This change also affects FacetGrid and PairGrid. Added the scale and scale_hue parameters to violinplot. These control how the width of the violins are scaled. The default is area, which is different from how the violins used to be drawn. Use scale='width' to get the old behavior. Used a different style for the box kind of interior plot in violinplot, which shows the whisker range in addition to the quartiles. Use inner='quartile' to get the old style. New plotting functions Added the stripplot function, which draws a scatterplot where one of the variables is categorical. This plot has the same API as boxplot and violinplot. It is useful both on its own and when composed with one of these other plot kinds to show both the observations and underlying distribution. Added the countplot function, which uses a bar plot representation to show counts of variables in one or more categorical bins. This replaces the old approach of calling barplot without a numeric variable. Other additions and changes The corrplot and underlying symmatplot functions have been deprecated in favor of heatmap, which is much more flexible and robust. These two functions are still available in version 0.6, but they will be removed in a future version. Added the set_color_codes function and the color_codes argument to set and set_palette. This changes the interpretation of shorthand color codes (i.e. "b", "g", k", etc.) within matplotlib to use the values from one of the named seaborn palettes (i.e. "deep", "muted", etc.). That makes it easier to have a more uniform look when using matplotlib functions directly with seaborn imported. This could be disruptive to existing plots, so it does not happen by default. It is possible this could change in the future. The color_palette function no longer trims palettes that are longer than 6 colors when passed into it. Added the as_hex method to color palette objects, to return a list of hex codes rather than rgb tuples. jointplot now passes additional keyword arguments to the function used to draw the plot on the joint axes. Changed the default linewidths in heatmap and clustermap to 0 so that larger matrices plot correctly. This parameter still exists and can be used to get the old effect of lines demarcating each cell in the heatmap (the old default linewidths was 0.5). heatmap and clustermap now automatically use a mask for missing values, which previously were shown with the "under" value of the colormap per default plt.pcolormesh behavior. Added the seaborn.crayons dictionary and the crayon_palette function to define colors from the 120 box (!) of Crayola crayons. Added the line_kws parameter to residplot to change the style of the lowess line, when used. Added open-ended **kwargs to the add_legend method on FacetGrid and PairGrid, which will pass additional keyword arguments through when calling the legend function on the Figure or Axes. Added the gridspec_kws parameter to FacetGrid, which allows for control over the size of individual facets in the grid to emphasize certain plots or account for differences in variable ranges. The interactive palette widgets now show a continuous colorbar, rather than a discrete palette, when as_cmap is True. The default Axes size for pairplot and PairGrid is now slightly smaller. Added the shade_lowest parameter to kdeplot which will set the alpha for the lowest contour level to 0, making it easier to plot multiple bivariate distributions on the same axes. The height parameter of rugplot is now interpreted as a function of the axis size and is invariant to changes in the data scale on that axis. The rug lines are also slightly narrower by default. Added a catch in distplot when calculating a default number of bins. For highly skewed data it will now use sqrt(n) bins, where previously the reference rule would return "infinite" bins and cause an exception in matplotlib. Added a ceiling (50) to the default number of bins used for distplot histograms. This will help avoid confusing errors with certain kinds of datasets that heavily violate the assumptions of the reference rule used to get a default number of bins. The ceiling is not applied when passing a specific number of bins. The various property dictionaries that can be passed to plt.boxplot are now applied after the seaborn restyling to allow for full customizability. Added a savefig method to JointGrid that defaults to a tight bounding box to make it easier to save figures using this class, and set a tight bbox as the default for the savefig method on other Grid objects. You can now pass an integer to the xticklabels and yticklabels parameter of heatmap (and, by extension, clustermap). This will make the plot use the ticklabels inferred from the data, but only plot every n label, where n is the number you pass. This can help when visualizing larger matrices with some sensible ordering to the rows or columns of the dataframe. Added "figure.facecolor" to the style parameters and set the default to white. The load_dataset function now caches datasets locally after downloading them, and uses the local copy on subsequent calls. Bug fixes Fixed bugs in clustermap where the mask and specified ticklabels were not being reorganized using the dendrograms. Fixed a bug in FacetGrid and PairGrid that lead to incorrect legend labels when levels of the hue variable appeared in hue_order but not in the data. Fixed a bug in FacetGrid.set_xticklabels or FacetGrid.set_yticklabels when col_wrap is being used. Fixed a bug in PairGrid where the hue_order parameter was ignored. Fixed two bugs in despine that caused errors when trying to trim the spines on plots that had inverted axes or no ticks. Improved support for the margin_titles option in FacetGrid, which can now be used with a legend.