Skip to main content

Seaborn for Statistical Plots

Master the Art of Statistical Visualization! 🎨

Seaborn is a Python data visualization library based on matplotlib that provides a high-level interface for drawing attractive and informative statistical graphics. With Seaborn, you can create beautiful, publication-ready visualizations with just a few lines of code!

Why Seaborn?

Seaborn builds on top of matplotlib and integrates closely with pandas data structures. Here's what makes it special:

Setting Up Seaborn

# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set the style
sns.set_theme(style="whitegrid")

# Load sample datasets
tips = sns.load_dataset("tips")
iris = sns.load_dataset("iris")
titanic = sns.load_dataset("titanic")

print("Seaborn version:", sns.__version__)
print("Available built-in datasets:", sns.get_dataset_names())

Seaborn Styles and Themes

Seaborn offers several built-in themes to quickly change the look of your plots:

# Set different styles
sns.set_theme(style="whitegrid")  # Clean with grid
sns.set_theme(style="darkgrid")   # Dark background with grid
sns.set_theme(style="white")      # Plain white background
sns.set_theme(style="dark")       # Plain dark background
sns.set_theme(style="ticks")      # White with tick marks

# Set context for different use cases
sns.set_context("notebook")  # Default
sns.set_context("paper")     # Smaller elements
sns.set_context("talk")      # Larger elements for presentations
sns.set_context("poster")    # Even larger for posters

# Custom context scaling
sns.set_context("notebook", font_scale=1.5, rc={"lines.linewidth": 2.5})

Color Palettes

Color palettes are one of Seaborn's strongest features. You can use built-in palettes or create custom ones:

# Built-in color palettes
sns.color_palette("deep")      # Default palette
sns.color_palette("pastel")    # Soft colors
sns.color_palette("bright")    # Vivid colors
sns.color_palette("dark")      # Dark colors
sns.color_palette("colorblind") # Colorblind-friendly

# Sequential palettes (good for ordered data)
sns.color_palette("Blues")     # Single hue gradient
sns.color_palette("rocket")    # Multi-hue gradient

# Diverging palettes (good for data with a meaningful center)
sns.color_palette("coolwarm")  # Cool to warm
sns.color_palette("RdBu")      # Red to blue

# Custom palettes
custom_palette = ["#FF6B6B", "#4ECDC4", "#45B7D1", "#96CEB4", "#FFEAA7"]
sns.set_palette(custom_palette)

# Using palettes in plots
plt.figure(figsize=(10, 6))
sns.barplot(data=tips, x="day", y="total_bill", palette="Set2")
plt.title("Average Bill by Day with Custom Palette")
plt.show()

Distribution Plots

Histogram with KDE

# Basic histogram
sns.histplot(data=tips, x="total_bill")

# With KDE overlay
sns.histplot(data=tips, x="total_bill", kde=True, stat="density")

# Multiple variables with hue
sns.histplot(data=tips, x="total_bill", hue="time", multiple="stack")

# 2D histogram for bivariate data
sns.histplot(data=tips, x="total_bill", y="tip", bins=30)

KDE Plots

# Basic KDE
sns.kdeplot(data=tips, x="total_bill")

# Multiple distributions
sns.kdeplot(data=tips, x="total_bill", hue="time", fill=True)

# 2D KDE for bivariate relationships
sns.kdeplot(data=tips, x="total_bill", y="tip", levels=5)

# Cumulative distribution
sns.kdeplot(data=tips, x="total_bill", cumulative=True)

Categorical Plots

Seaborn excels at visualizing categorical data with various plot types:

# Strip plot - shows all points
sns.stripplot(data=tips, x="day", y="total_bill", hue="sex", dodge=True)

# Swarm plot - non-overlapping points
sns.swarmplot(data=tips, x="day", y="total_bill", hue="sex", dodge=True)

# Box plot - shows quartiles and outliers
sns.boxplot(data=tips, x="day", y="total_bill", hue="smoker")

# Violin plot - combines KDE and box plot
sns.violinplot(data=tips, x="day", y="total_bill", hue="sex", split=True)

# Bar plot - shows mean with error bars
sns.barplot(data=tips, x="day", y="total_bill", hue="sex", ci="sd")

# Count plot - shows counts of observations
sns.countplot(data=tips, x="day", hue="sex", palette="Set2")

Regression and Relationship Plots

# Linear regression plot
plt.figure(figsize=(12, 4))

plt.subplot(1, 3, 1)
sns.regplot(data=tips, x="total_bill", y="tip")
plt.title("Basic Linear Regression")

plt.subplot(1, 3, 2)
sns.regplot(data=tips, x="total_bill", y="tip", order=2, ci=95)
plt.title("Polynomial Regression (Order 2)")

plt.subplot(1, 3, 3)
sns.regplot(data=tips, x="total_bill", y="tip", lowess=True)
plt.title("LOWESS Smoothing")

plt.tight_layout()
plt.show()

# Residual plot
plt.figure(figsize=(10, 5))
sns.residplot(data=tips, x="total_bill", y="tip", lowess=True, color="g")
plt.title("Residual Plot")
plt.show()

Matrix Plots - Heatmaps and Clustering

# Correlation heatmap
plt.figure(figsize=(10, 8))
correlation_matrix = tips.select_dtypes(include=[np.number]).corr()
sns.heatmap(correlation_matrix, 
           annot=True,           # Show values
           cmap="coolwarm",      # Color scheme
           vmin=-1, vmax=1,      # Scale limits
           center=0,             # Center point
           square=True,          # Square cells
           linewidths=1,         # Grid lines
           cbar_kws={"shrink": 0.8})
plt.title("Correlation Heatmap")
plt.show()

# Clustermap (hierarchical clustering)
plt.figure(figsize=(10, 8))
sns.clustermap(correlation_matrix, 
              cmap="RdBu_r", 
              annot=True, 
              fmt=".2f",
              linewidths=0.5)
plt.title("Hierarchical Clustering of Features")
plt.show()

Pair Plots and Joint Plots

# Pair plot - explore relationships between all numeric variables
sns.pairplot(iris, hue="species", 
            diag_kind="kde",     # Diagonal plot type
            markers=["o", "s", "D"],  # Different markers
            palette="husl")

# Advanced pair plot with regression
sns.pairplot(tips[["total_bill", "tip", "size"]], 
            kind="reg",          # Regression plots
            diag_kind="kde",
            plot_kws={'line_kws':{'color':'red'}})

# Joint plot - detailed view of two variables
g = sns.jointplot(data=tips, x="total_bill", y="tip", 
                 kind="hex",     # Hexbin plot
                 marginal_kws=dict(bins=25))

# Different joint plot types
kinds = ["scatter", "kde", "hist", "hex", "reg", "resid"]
for kind in kinds:
    sns.jointplot(data=tips, x="total_bill", y="tip", kind=kind)
    plt.suptitle(f"Joint Plot - {kind.title()}")

FacetGrid - Multi-plot Grids

FacetGrid allows you to create a grid of plots based on categorical variables:

# Basic FacetGrid
g = sns.FacetGrid(tips, col="time", row="smoker", 
                 height=4, aspect=1.2)
g.map(sns.scatterplot, "total_bill", "tip")
g.add_legend()

# With histogram
g = sns.FacetGrid(tips, col="day", col_wrap=2, height=3)
g.map(sns.histplot, "total_bill", kde=True)

# Complex mapping with multiple layers
g = sns.FacetGrid(tips, col="time", row="smoker", 
                 hue="sex", palette="Set1",
                 height=4, aspect=1.2)
g.map(sns.scatterplot, "total_bill", "tip", alpha=0.7)
g.map(sns.regplot, "total_bill", "tip", scatter=False)
g.add_legend()

# Categorical plots in grid
g = sns.FacetGrid(tips, col="day", height=4, aspect=0.5)
g.map(sns.boxplot, "time", "total_bill", order=["Lunch", "Dinner"])

Real-world Example: Sales Analysis Dashboard

# Create sample sales data
np.random.seed(42)
dates = pd.date_range('2024-01-01', periods=365)
sales_data = pd.DataFrame({
    'date': dates,
    'sales': np.random.normal(1000, 200, 365) + np.sin(np.arange(365) * 2 * np.pi / 365) * 300,
    'region': np.random.choice(['North', 'South', 'East', 'West'], 365),
    'product': np.random.choice(['A', 'B', 'C'], 365),
    'customer_type': np.random.choice(['New', 'Returning'], 365)
})
sales_data['month'] = sales_data['date'].dt.month
sales_data['quarter'] = sales_data['date'].dt.quarter

# Create comprehensive dashboard
fig = plt.figure(figsize=(16, 12))

# 1. Time series with trend
ax1 = plt.subplot(3, 3, 1)
sns.lineplot(data=sales_data, x='date', y='sales', ax=ax1)
ax1.set_title('Sales Trend Over Time')
ax1.tick_params(axis='x', rotation=45)

# 2. Sales by region
ax2 = plt.subplot(3, 3, 2)
sns.boxplot(data=sales_data, x='region', y='sales', ax=ax2)
ax2.set_title('Sales Distribution by Region')

# 3. Product performance
ax3 = plt.subplot(3, 3, 3)
product_sales = sales_data.groupby('product')['sales'].mean().reset_index()
sns.barplot(data=product_sales, x='product', y='sales', palette='viridis', ax=ax3)
ax3.set_title('Average Sales by Product')

# 4. Customer type comparison
ax4 = plt.subplot(3, 3, 4)
sns.violinplot(data=sales_data, x='customer_type', y='sales', split=True, ax=ax4)
ax4.set_title('Sales by Customer Type')

# 5. Monthly patterns
ax5 = plt.subplot(3, 3, 5)
monthly_sales = sales_data.groupby('month')['sales'].mean().reset_index()
sns.lineplot(data=monthly_sales, x='month', y='sales', marker='o', ax=ax5)
ax5.set_title('Monthly Sales Pattern')
ax5.set_xticks(range(1, 13))

# 6. Region-Product heatmap
ax6 = plt.subplot(3, 3, 6)
pivot_table = sales_data.pivot_table(values='sales', index='region', columns='product', aggfunc='mean')
sns.heatmap(pivot_table, annot=True, fmt='.0f', cmap='YlOrRd', ax=ax6)
ax6.set_title('Sales Heatmap: Region vs Product')

# 7. Quarterly comparison
ax7 = plt.subplot(3, 3, 7)
sns.countplot(data=sales_data, x='quarter', hue='region', ax=ax7)
ax7.set_title('Transaction Count by Quarter')

# 8. Sales distribution
ax8 = plt.subplot(3, 3, 8)
sns.histplot(data=sales_data, x='sales', kde=True, bins=30, ax=ax8)
ax8.set_title('Overall Sales Distribution')

# 9. Correlation matrix
ax9 = plt.subplot(3, 3, 9)
numeric_cols = sales_data.select_dtypes(include=[np.number]).columns
corr_matrix = sales_data[numeric_cols].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0, ax=ax9)
ax9.set_title('Feature Correlation')

plt.suptitle('Sales Analysis Dashboard', fontsize=16, y=1.02)
plt.tight_layout()
plt.show()

Best Practices and Tips

🎨 Choose the Right Plot

🎯 Optimize for Your Audience

⚡ Performance Tips

Common Patterns and Use Cases

graph TD A[Data Type] --> B{Continuous?} B -->|Yes| C[Distribution] B -->|No| D[Categorical] C --> E[histplot] C --> F[kdeplot] C --> G[ecdfplot] D --> H[countplot] D --> I[barplot] D --> J[boxplot] A --> K{Bivariate?} K -->|Yes| L[Relationship] L --> M[scatterplot] L --> N[regplot] L --> O[jointplot] K -->|Multivariate| P[Matrix] P --> Q[pairplot] P --> R[heatmap] P --> S[clustermap]

Practice Exercises

Exercise 1: Create a Statistical Dashboard

Using the tips dataset, create a 4-panel dashboard showing:

  1. Distribution of tips by day (use violinplot)
  2. Relationship between bill and tip with regression line
  3. Average bill by time and day (use heatmap)
  4. Correlation heatmap of numeric variables

Exercise 2: Custom Color Palette

Create a custom color palette matching your brand colors and apply it to a multi-category plot. Use at least 5 different colors and demonstrate it with a barplot.

Exercise 3: Complex FacetGrid

Use FacetGrid to create a grid of plots showing how tips vary by:

Key Takeaways

Further Resources