R survival kit

Here are basic commands to do data exploratory in R.

Library: ggplot2, dplyr, tidyr, ggally, reshape2

names: display name of variables.
length: count number of samples.
scale_x_continuous
scale_x_log10()
geom_histo
geom_point: scatter plot
geom_line: line plot
geom_smooth:
with()
subset()
table(): counting samples in 1 variable
cut(): cut a variable

ggplot(aes(x = tenure / 365), data = pf) + 
  geom_histogram(color = 'black', fill = '#F79420') + 
  scale_x_continuous(breaks = seq(1, 7, 1), limits = c(0, 7)) + 
  xlab('Number of years using Facebook') + 
  ylab('Number of users in sample')

package: tidyr

A package that reshapes the layout of data sets.

gather(): Collapses multiple columns into two columns.

spread(): Generates multiple columns from two columns.

seperate(): separate splits a column by a character string separator.

unite(): Unite unites columns into a single column.

package: dplyr

select()

filter()

mutate()

summarise()

arrange()

group_by()

bind_cols(): join columns

bind_rows(): join rows

union(): join row if there is difference

intersect(): get similar

setdiff(): get different

left_join()

inner_join()

semi_join()

anti_join()

apply()

sapply(): sapply is wrapper class to lapply with difference being it returns vector or matrix instead of list object.

lapply(): lapply function is applied for operations on list objects and returns a list object of same length of original set.

tapply(): tapply() is a very powerful function that lets you break a vector into pieces and then apply some function to each of the pieces.

ggpairs(): multiple scatter plots. — Cheatsheet:

Written on May 28, 2017