Let's take a hypothetical example where you have scraped data from a website displaying football/soccer results. You find yourself with this data:
df <- data.frame( match_id = c(1,2,3,4), home_team = c("Chelsea", "Arsenal", "Liverpool", "Fulham"), away_team = c("Newcastle", "West Ham", "Tottenham", "Everton"), score = c("2-2", "2-1", "1-1", "1-3") ) df match_id home_team away_team score 1 Chelsea Newcastle 2-2 2 Arsenal West Ham 2-1 3 Liverpool Tottenham 1-1 4 Fulham Everton 1-3
And you wish to separate the score
column into separate columns, one for home goals scored, and another for away goals scored.
tidyr::separate
data | the dataframe or tibble | ||
col | the source column to split | ||
into | the target columns to receive the split data | ||
sep | the separator to split the column by |
# install the tidyr package if not installed # install.packages('tidyr') tidyr::separate(data = df, col = 'score', into = c('home_score', 'away_score'), sep = '-') match_id home_team away_team home_score away_score 1 Chelsea Newcastle 2 2 2 Arsenal West Ham 2 1 3 Liverpool Tottenham 1 1 4 Fulham Everton 1 3
By default, the target columns replace the source column in the resulting dataframe. In order to retain the source column, use the remove = F
flag,
tidyr::separate(data = df, col = 'score', into = c('home_score', 'away_score'), sep = '-', remove = F) match_id home_team away_team score home_score away_score 1 Chelsea Newcastle 2-2 2 2 2 Arsenal West Ham 2-1 2 1 3 Liverpool Tottenham 1-1 1 1 4 Fulham Everton 1-3 1 3
Article one