R survival kit

Here are basic commands to do data exploratory in R.

Library: ggplot2, dplyr, tidyr, ggally, reshape2

  • names: display name of variables.
  • length: count number of samples.
  • scale_x_continuous
  • scale_x_log10()
  • geom_histo
  • geom_point: scatter plot
  • geom_line: line plot
  • geom_smooth:
  • with()
  • subset()
  • table(): counting samples in 1 variable
  • cut(): cut a variable
ggplot(aes(x = tenure / 365), data = pf) + 
  geom_histogram(color = 'black', fill = '#F79420') + 
  scale_x_continuous(breaks = seq(1, 7, 1), limits = c(0, 7)) + 
  xlab('Number of years using Facebook') + 
  ylab('Number of users in sample')

package: tidyr

A package that reshapes the layout of data sets.

gather(): Collapses multiple columns into two columns.

spread(): Generates multiple columns from two columns.


seperate(): separate splits a column by a character string separator.

unite(): Unite unites columns into a single column.

package: dplyr


select()

filter()

mutate()

summarise()

arrange()

group_by()

bind_cols(): join columns

bind_rows(): join rows

union(): join row if there is difference

intersect(): get similar

setdiff(): get different

left_join()

inner_join()

semi_join()

anti_join()


apply()

sapply(): sapply is wrapper class to lapply with difference being it returns vector or matrix instead of list object.

lapply(): lapply function is applied for operations on list objects and returns a list object of same length of original set.

tapply(): tapply() is a very powerful function that lets you break a vector into pieces and then apply some function to each of the pieces.


ggpairs(): multiple scatter plots. — Cheatsheet:

Written on May 28, 2017