ggplot2

Powerful data visualization package based on Grammar of Graphics. Builds complex graphs incrementally through a layer system. Creates beautiful and insightful visualizations with rich geoms, statistical functions, and theme systems.

Rdata visualizationgraphschartsGrammar of Graphicstidyverseplotting

GitHub Overview

tidyverse/ggplot2

An implementation of the Grammar of Graphics in R

Stars6,727
Watchers298
Forks2,094
Created:May 25, 2008
Language:R
License:Other

Topics

data-visualisationrvisualisation

Star History

tidyverse/ggplot2 Star History
Data as of: 7/16/2025, 11:28 AM

Framework

ggplot2

Overview

ggplot2 is a data visualization package for R designed based on the "Grammar of Graphics" concept. Its layering system allows declarative creation of sophisticated statistical graphics by combining data, aesthetic mappings, and geometric objects.

Details

ggplot2 is an innovative data visualization package for R developed by Hadley Wickham. First released in 2007, it implements Leland Wilkinson's Grammar of Graphics theory. This approach decomposes graphics into independent components: data, aesthetic mappings (aes), geometric objects (geom), statistical transformations (stat), scales, coordinate systems, facets, and themes. By stacking these elements using the + operator, you can build any type of visualization from simple scatter plots to complex multi-layered graphs. The ggproto system's extensibility allows creation of custom geoms and stats. As part of the tidyverse ecosystem, it seamlessly integrates with dplyr and tidyr, enabling consistent workflows from data preprocessing to visualization. It has been adopted as a standard visualization tool across various fields including academic papers, business reports, and data journalism.

Pros and Cons

Pros

  • Theoretical Foundation: Systematic approach based on Grammar of Graphics
  • High Extensibility: Build complex graphs with the layering system
  • Beautiful Defaults: Sophisticated default themes
  • Tidyverse Integration: Consistency from data processing to visualization
  • Rich Extension Packages: Thriving ggplot2 extension ecosystem
  • Reproducibility: Completely reproducible through code

Cons

  • Learning Curve: Takes time to understand concepts
  • Performance: Can be slow with large-scale data
  • Memory Usage: Plot objects consume memory
  • Interactivity: Static plots are the default

Main Use Cases

  • Academic paper figure creation
  • Business dashboards
  • Exploratory data analysis
  • Statistical reports
  • Data journalism
  • Presentation materials
  • Quality control charts

Basic Usage

Installation

# Install from CRAN
install.packages("ggplot2")

# Install entire tidyverse
install.packages("tidyverse")

# Install development version
devtools::install_github("tidyverse/ggplot2")

# Load library
library(ggplot2)

Basic Plots

# Sample data
data(mpg)  # Fuel economy dataset

# Scatter plot
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point()

# Colored scatter plot
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
  geom_point()

# Add regression line
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  geom_smooth(method = "lm")

# Bar chart
ggplot(mpg, aes(x = class)) +
  geom_bar()

# Histogram
ggplot(mpg, aes(x = hwy)) +
  geom_histogram(bins = 30)

# Box plot
ggplot(mpg, aes(x = class, y = hwy)) +
  geom_boxplot()

Utilizing the Layering System

# Stack multiple layers
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(method = "loess", se = TRUE) +
  labs(
    title = "Engine Displacement vs Highway Fuel Economy",
    x = "Displacement (L)",
    y = "Highway MPG",
    color = "Vehicle Class"
  ) +
  theme_minimal()

# Faceting (small multiples)
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  facet_wrap(~ class, ncol = 3)

# Grouping and aesthetic mapping
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(color = factor(cyl), shape = drv), size = 3) +
  scale_color_brewer(palette = "Set1") +
  theme_bw()

Advanced Customization

# Custom theme
my_theme <- theme(
  plot.title = element_text(size = 16, face = "bold"),
  axis.title = element_text(size = 12),
  legend.position = "bottom",
  panel.grid.minor = element_blank()
)

# Statistical summaries
ggplot(mpg, aes(x = class, y = hwy)) +
  stat_summary(
    fun = mean,
    geom = "bar",
    fill = "skyblue"
  ) +
  stat_summary(
    fun.data = mean_se,
    geom = "errorbar",
    width = 0.2
  ) +
  coord_flip() +
  my_theme

# Density plots
ggplot(mpg, aes(x = hwy, fill = drv)) +
  geom_density(alpha = 0.5) +
  scale_fill_manual(
    values = c("4" = "#E41A1C", "f" = "#377EB8", "r" = "#4DAF4A"),
    labels = c("4WD", "FWD", "RWD")
  )

# Annotations
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  annotate(
    "text",
    x = 5.5, y = 40,
    label = "Fuel efficient vehicles",
    size = 5,
    color = "red"
  ) +
  annotate(
    "rect",
    xmin = 1.5, xmax = 2.5,
    ymin = 35, ymax = 45,
    alpha = 0.2,
    fill = "yellow"
  )

Latest Trends (2025)

  • ggplot2 4.0: Faster rendering engine
  • Interactive Extensions: Enhanced plotly integration
  • Accessibility: Default palettes for color vision diversity
  • AI Assist: Automatic plot suggestion features
  • WebAssembly Support: Direct execution in browsers

Summary

ggplot2 continues to reign as the absolute standard for data visualization in R in 2025. With its systematic approach based on Grammar of Graphics and beautiful default settings, even beginners can create professional-quality graphs. While static plots are the default, extensions like plotly and ggiraph enable interactive visualizations. It remains one of the most trusted visualization tools in data science practice.