tidyr

Package for data reshaping and tidying. Provides essential data cleaning functions including wide-to-long format conversion, nested data manipulation, and missing value handling.

Rdata reshapingdata transformationtidyversepivotreshapedata cleaning

GitHub Overview

tidyverse/tidyr

Tidy Messy Data

Repository:https://github.com/tidyverse/tidyr

Homepage:https://tidyr.tidyverse.org/

Stars1,406

Watchers70

Forks419

Created:June 10, 2014

Language:R

License:Other

Topics

rtidy-data

Star History

Data as of: 7/16/2025, 11:28 AM

Framework

tidyr

Overview

tidyr is a package for data reshaping and tidying in R. With pivot_longer() and pivot_wider() functions, it enables intuitive conversion between wide and long formats, supporting data structuring based on tidy data principles.

Details

tidyr is a data reshaping package for R developed by Hadley Wickham. It implements the concept of "tidy data" and streamlines data preprocessing for analysis. First released in 2014, it is positioned as the successor to the reshape2 package. In the latest version, pivot_longer() and pivot_wider() have become the main functions, providing more flexible and powerful data transformation than the traditional gather() and spread(). It covers all necessary data cleaning functions including nest() and unnest() for nested data frame operations, separate() and unite() for column splitting and combining, fill() for missing value imputation, and drop_na() for missing value removal. As part of the tidyverse, it integrates seamlessly with dplyr and ggplot2, enabling fluent data processing using the pipe operator (%>%). By implementing the three principles of tidy data (each variable is a column, each observation is a row, each value is a cell), it significantly simplifies subsequent analysis.

Pros and Cons

Pros

Intuitive Function Names: Clear operations with verbs like pivot, nest, fill
Flexible Transformation: Concise description of complex data shape transformations
Tidyverse Integration: Consistent grammar with other packages
Powerful Pivot Functions: More functional than old gather/spread
Nested Data Support: Efficient list column processing
Missing Value Handling: Various imputation and removal options

Cons

Memory Usage: Requires attention during transformation with large data
Learning Curve: Understanding tidy data concepts is prerequisite
Performance: Can be slower than data.table in some cases
Backward Compatibility: Confusion from function deprecation

Main Use Cases

Wide/long format data conversion
Survey data reshaping
Time series data structuring
Multiple variable separation and combination
Systematic missing value processing
Database normalization
Data preparation for visualization

Basic Usage

Installation

# Install from CRAN
install.packages("tidyr")

# Install entire tidyverse
install.packages("tidyverse")

# Install development version
devtools::install_github("tidyverse/tidyr")

# Load library
library(tidyr)

Basic Operations

# Sample data
df_wide <- data.frame(
  id = 1:3,
  name = c("Alice", "Bob", "Charlie"),
  math_2023 = c(90, 85, 78),
  math_2024 = c(92, 88, 80),
  english_2023 = c(88, 90, 85),
  english_2024 = c(90, 91, 87)
)

# pivot_longer: Wide to long format
df_long <- df_wide %>%
  pivot_longer(
    cols = -c(id, name),
    names_to = c("subject", "year"),
    names_sep = "_",
    values_to = "score"
  )

# pivot_wider: Long to wide format
df_wide_back <- df_long %>%
  pivot_wider(
    names_from = c(subject, year),
    values_from = score,
    names_sep = "_"
  )

# separate: Column splitting
df <- data.frame(
  id = 1:3,
  date_time = c("2024-01-15 10:30", "2024-01-16 14:20", "2024-01-17 09:15")
)

df_separated <- df %>%
  separate(date_time, into = c("date", "time"), sep = " ")

# unite: Column combining
df_united <- df_separated %>%
  unite("datetime", date, time, sep = "T")

# fill: Missing value imputation
df_missing <- data.frame(
  year = c(2020, NA, NA, 2023, NA),
  value = c(100, 110, 120, 130, 140)
)

df_filled <- df_missing %>%
  fill(year, .direction = "down")

Advanced Operations

# Complex pivot operations
sales_data <- data.frame(
  store = rep(c("A", "B"), each = 6),
  product = rep(c("X", "Y", "Z"), 4),
  month = rep(c("Jan", "Feb"), each = 3, times = 2),
  revenue = runif(12, 1000, 5000),
  units = sample(50:200, 12)
)

# Pivot multiple value columns simultaneously
sales_wide <- sales_data %>%
  pivot_wider(
    names_from = c(product, month),
    values_from = c(revenue, units),
    names_sep = "_",
    names_glue = "{.value}_{product}_{month}"
  )

# nest: Data nesting
nested_data <- sales_data %>%
  group_by(store) %>%
  nest()

# unnest: Unnesting
unnested_data <- nested_data %>%
  unnest(cols = data)

# complete: Complete missing combinations
complete_data <- sales_data %>%
  complete(store, product, month, 
           fill = list(revenue = 0, units = 0))

# drop_na: Remove rows with missing values
clean_data <- data.frame(
  x = c(1, 2, NA, 4),
  y = c("a", NA, "c", "d"),
  z = c(TRUE, TRUE, FALSE, NA)
) %>%
  drop_na()  # Remove rows with NA in any column

# Target specific columns only
clean_data_x <- data.frame(
  x = c(1, 2, NA, 4),
  y = c("a", NA, "c", "d")
) %>%
  drop_na(x)  # Consider only NA in x column

Practical Examples

# Time series data reshaping
library(dplyr)

# Multiple indicator time series data
economic_data <- data.frame(
  date = seq(as.Date("2023-01-01"), by = "month", length.out = 12),
  gdp_growth = runif(12, -2, 5),
  inflation = runif(12, 0, 4),
  unemployment = runif(12, 3, 7)
)

# Convert to long format for analysis
economic_long <- economic_data %>%
  pivot_longer(
    cols = -date,
    names_to = "indicator",
    values_to = "value"
  ) %>%
  mutate(
    indicator = factor(indicator, 
                      levels = c("gdp_growth", "inflation", "unemployment"),
                      labels = c("GDP Growth", "Inflation Rate", "Unemployment Rate"))
  )

# Survey data reshaping
survey_data <- data.frame(
  respondent_id = 1:100,
  q1_satisfied = sample(1:5, 100, replace = TRUE),
  q1_important = sample(1:5, 100, replace = TRUE),
  q2_satisfied = sample(1:5, 100, replace = TRUE),
  q2_important = sample(1:5, 100, replace = TRUE)
)

survey_long <- survey_data %>%
  pivot_longer(
    cols = -respondent_id,
    names_to = c("question", "measure"),
    names_sep = "_",
    values_to = "score"
  ) %>%
  pivot_wider(
    names_from = measure,
    values_from = score
  )

Latest Trends (2025)

Pivot Function Extensions: Support for more complex transformation patterns
Performance Improvements: Speed optimization for large datasets
Enhanced Type Safety: Stronger integration with vctrs package
New Helper Functions: Data validation and diagnostic tools
Arrow Integration: More efficient data exchange

Summary

tidyr continues to play an important role as the standard tool for data reshaping in R in 2025. With intuitive functions centered on pivot_longer() and pivot_wider(), even complex data transformations can be written concisely. As part of the tidyverse ecosystem, it is indispensable in the preprocessing stage of data science workflows. By practicing tidy data principles, subsequent analysis and visualization can be significantly streamlined.