R survival kit
Here are basic commands to do data exploratory in R.
Library: ggplot2, dplyr, tidyr, ggally, reshape2
- names: display name of variables.
- length: count number of samples.
- scale_x_continuous
- scale_x_log10()
- geom_histo
- geom_point: scatter plot
- geom_line: line plot
- geom_smooth:
- with()
- subset()
- table(): counting samples in 1 variable
- cut(): cut a variable
ggplot(aes(x = tenure / 365), data = pf) +
geom_histogram(color = 'black', fill = '#F79420') +
scale_x_continuous(breaks = seq(1, 7, 1), limits = c(0, 7)) +
xlab('Number of years using Facebook') +
ylab('Number of users in sample')
package: tidyr
A package that reshapes the layout of data sets.
gather(): Collapses multiple columns into two columns.
spread(): Generates multiple columns from two columns.
seperate(): separate splits a column by a character string separator.
unite(): Unite unites columns into a single column.
package: dplyr
select()
filter()
mutate()
summarise()
arrange()
group_by()
bind_cols(): join columns
bind_rows(): join rows
union(): join row if there is difference
intersect(): get similar
setdiff(): get different
left_join()
inner_join()
semi_join()
anti_join()
apply()
sapply(): sapply is wrapper class to lapply with difference being it returns vector or matrix instead of list object.
lapply(): lapply function is applied for operations on list objects and returns a list object of same length of original set.
tapply(): tapply() is a very powerful function that lets you break a vector into pieces and then apply some function to each of the pieces.
ggpairs(): multiple scatter plots. — Cheatsheet: