tidyr

Package for data reshaping and tidying. Provides essential data cleaning functions including wide-to-long format conversion, nested data manipulation, and missing value handling.

Rdata reshapingdata transformationtidyversepivotreshapedata cleaning

GitHub Overview

tidyverse/tidyr

Tidy Messy Data

Stars1,406
Watchers70
Forks419
Created:June 10, 2014
Language:R
License:Other

Topics

rtidy-data

Star History

tidyverse/tidyr Star History
Data as of: 7/16/2025, 11:28 AM

Framework

tidyr

Overview

tidyr is a package for data reshaping and tidying in R. With pivot_longer() and pivot_wider() functions, it enables intuitive conversion between wide and long formats, supporting data structuring based on tidy data principles.

Details

tidyr is a data reshaping package for R developed by Hadley Wickham. It implements the concept of "tidy data" and streamlines data preprocessing for analysis. First released in 2014, it is positioned as the successor to the reshape2 package. In the latest version, pivot_longer() and pivot_wider() have become the main functions, providing more flexible and powerful data transformation than the traditional gather() and spread(). It covers all necessary data cleaning functions including nest() and unnest() for nested data frame operations, separate() and unite() for column splitting and combining, fill() for missing value imputation, and drop_na() for missing value removal. As part of the tidyverse, it integrates seamlessly with dplyr and ggplot2, enabling fluent data processing using the pipe operator (%>%). By implementing the three principles of tidy data (each variable is a column, each observation is a row, each value is a cell), it significantly simplifies subsequent analysis.

Pros and Cons

Pros

  • Intuitive Function Names: Clear operations with verbs like pivot, nest, fill
  • Flexible Transformation: Concise description of complex data shape transformations
  • Tidyverse Integration: Consistent grammar with other packages
  • Powerful Pivot Functions: More functional than old gather/spread
  • Nested Data Support: Efficient list column processing
  • Missing Value Handling: Various imputation and removal options

Cons

  • Memory Usage: Requires attention during transformation with large data
  • Learning Curve: Understanding tidy data concepts is prerequisite
  • Performance: Can be slower than data.table in some cases
  • Backward Compatibility: Confusion from function deprecation

Main Use Cases

  • Wide/long format data conversion
  • Survey data reshaping
  • Time series data structuring
  • Multiple variable separation and combination
  • Systematic missing value processing
  • Database normalization
  • Data preparation for visualization

Basic Usage

Installation

# Install from CRAN
install.packages("tidyr")

# Install entire tidyverse
install.packages("tidyverse")

# Install development version
devtools::install_github("tidyverse/tidyr")

# Load library
library(tidyr)

Basic Operations

# Sample data
df_wide <- data.frame(
  id = 1:3,
  name = c("Alice", "Bob", "Charlie"),
  math_2023 = c(90, 85, 78),
  math_2024 = c(92, 88, 80),
  english_2023 = c(88, 90, 85),
  english_2024 = c(90, 91, 87)
)

# pivot_longer: Wide to long format
df_long <- df_wide %>%
  pivot_longer(
    cols = -c(id, name),
    names_to = c("subject", "year"),
    names_sep = "_",
    values_to = "score"
  )

# pivot_wider: Long to wide format
df_wide_back <- df_long %>%
  pivot_wider(
    names_from = c(subject, year),
    values_from = score,
    names_sep = "_"
  )

# separate: Column splitting
df <- data.frame(
  id = 1:3,
  date_time = c("2024-01-15 10:30", "2024-01-16 14:20", "2024-01-17 09:15")
)

df_separated <- df %>%
  separate(date_time, into = c("date", "time"), sep = " ")

# unite: Column combining
df_united <- df_separated %>%
  unite("datetime", date, time, sep = "T")

# fill: Missing value imputation
df_missing <- data.frame(
  year = c(2020, NA, NA, 2023, NA),
  value = c(100, 110, 120, 130, 140)
)

df_filled <- df_missing %>%
  fill(year, .direction = "down")

Advanced Operations

# Complex pivot operations
sales_data <- data.frame(
  store = rep(c("A", "B"), each = 6),
  product = rep(c("X", "Y", "Z"), 4),
  month = rep(c("Jan", "Feb"), each = 3, times = 2),
  revenue = runif(12, 1000, 5000),
  units = sample(50:200, 12)
)

# Pivot multiple value columns simultaneously
sales_wide <- sales_data %>%
  pivot_wider(
    names_from = c(product, month),
    values_from = c(revenue, units),
    names_sep = "_",
    names_glue = "{.value}_{product}_{month}"
  )

# nest: Data nesting
nested_data <- sales_data %>%
  group_by(store) %>%
  nest()

# unnest: Unnesting
unnested_data <- nested_data %>%
  unnest(cols = data)

# complete: Complete missing combinations
complete_data <- sales_data %>%
  complete(store, product, month, 
           fill = list(revenue = 0, units = 0))

# drop_na: Remove rows with missing values
clean_data <- data.frame(
  x = c(1, 2, NA, 4),
  y = c("a", NA, "c", "d"),
  z = c(TRUE, TRUE, FALSE, NA)
) %>%
  drop_na()  # Remove rows with NA in any column

# Target specific columns only
clean_data_x <- data.frame(
  x = c(1, 2, NA, 4),
  y = c("a", NA, "c", "d")
) %>%
  drop_na(x)  # Consider only NA in x column

Practical Examples

# Time series data reshaping
library(dplyr)

# Multiple indicator time series data
economic_data <- data.frame(
  date = seq(as.Date("2023-01-01"), by = "month", length.out = 12),
  gdp_growth = runif(12, -2, 5),
  inflation = runif(12, 0, 4),
  unemployment = runif(12, 3, 7)
)

# Convert to long format for analysis
economic_long <- economic_data %>%
  pivot_longer(
    cols = -date,
    names_to = "indicator",
    values_to = "value"
  ) %>%
  mutate(
    indicator = factor(indicator, 
                      levels = c("gdp_growth", "inflation", "unemployment"),
                      labels = c("GDP Growth", "Inflation Rate", "Unemployment Rate"))
  )

# Survey data reshaping
survey_data <- data.frame(
  respondent_id = 1:100,
  q1_satisfied = sample(1:5, 100, replace = TRUE),
  q1_important = sample(1:5, 100, replace = TRUE),
  q2_satisfied = sample(1:5, 100, replace = TRUE),
  q2_important = sample(1:5, 100, replace = TRUE)
)

survey_long <- survey_data %>%
  pivot_longer(
    cols = -respondent_id,
    names_to = c("question", "measure"),
    names_sep = "_",
    values_to = "score"
  ) %>%
  pivot_wider(
    names_from = measure,
    values_from = score
  )

Latest Trends (2025)

  • Pivot Function Extensions: Support for more complex transformation patterns
  • Performance Improvements: Speed optimization for large datasets
  • Enhanced Type Safety: Stronger integration with vctrs package
  • New Helper Functions: Data validation and diagnostic tools
  • Arrow Integration: More efficient data exchange

Summary

tidyr continues to play an important role as the standard tool for data reshaping in R in 2025. With intuitive functions centered on pivot_longer() and pivot_wider(), even complex data transformations can be written concisely. As part of the tidyverse ecosystem, it is indispensable in the preprocessing stage of data science workflows. By practicing tidy data principles, subsequent analysis and visualization can be significantly streamlined.