## Zero to R in 30 minutes

What is R?

R is a free (open source) programming language that can beused for statistical computations and data visualization. It is cross-platformcompatible which means that it can be used on Windows, MacOS, Linux, etc.

Why is it used?

In simple words, R can help you to tinker with data and figure some useful inferences out of that dataset, which can further be used for making useful decisions.

Data Manipulation: To shape the dataset into the required format.

Data Analysis: Over 4000 packages are available for implementation of statistical analysis like hypothesis testing, model fitting, clustering techniques, and machine learning.

Data Visualisation: Animated and interactive graphs can be created using R.

How to use R?

It will look like this:

A more recommended version for starters would be an IDE (Integrated Development Environment) like RStudio, which may be downloaded from RStudio.

To understand what an IDE is, think of MS Word and Notepad. Both perform the same task of writing, but the former offers an easy interface coupled with several useful tools.

What statistical operations can it perform?

R can compute a huge variety of statistical operations including:

• Mean, Median, Mode, Standard Deviation, Variance, etc.
• Regression
• ANOVA
• Binomial Distribution
• Chi-square test
• Analysis of covariance
• Random Forest
• Survival Analysis

What type of graphs can it produce?

Different types of graphs can be created using R, including line-plot, box-plot, histogram, density curve, scatter plot, bar plot, etc.

How to code in R?

A try-on tutorial has been provided with guided comments, in two parts:
1. Basic Operations
2. Basic Statistics
3. Charts and graphs

Basic Operations

` # ---------------------- # BASIC OPERATIONS IN R # ----------------------   # Clear console code by: CTRL + L (Windows and Ubuntu) & Option + Command + L (Mac)   # Anything written after the hash mark is a comment   # Simple math   # Assign values x <- 10; y <- 3;   # Add the rassigned values x + y   # Subtract the assigned values x-y    # Multiplication x*y   # Division x/y   # Exponentiation x^y x**y   # Remainder or Modulus x%%y   # Integer division, gives out integer value after division x%/%y   # Log and exponentials  vedang <- (1:10) vedang   # Natural logarithm  log(vedang)   # The above can also be assigned to a variable-v, which can be printed: v <- log(vedang) v   # exponential  exp (vedang)   # Logarithm with base setting  log(vedang, base = 10)   # Square root  sqrt(9)   # Factorial factorial(5)   # Combination- nCr n = 6 ; r  = 2 vedang = choose(n,r) vedang ved = choose(n,n-r) ved # (both values come out to be same, because nCr = nCn-r)   # Rounding off v = 326.358   # Rounding off the decimal places round(v,digits = 2)   # Floor value floor(v)   # Ceiling value ceiling(v)   # Truncation  trunc(v)   # ---------------------- # VECTORS  # ----------------------   # Clear console code by: CTRL + L (Windows and Ubuntu) & Option + Command + L (Mac)   # Defining a Vector  Vector1 <- c(10,2,1,5,7,4) Vector1   # Vector with an incrementing sequence of numbers Vector2 <- seq(from = 2.5, to = 5.0, by = 0.5) Vector2   # Linear operation  10* Vector1 + 2*Vector2    # Combination  c(Vector1,Vector2)   # Repeatition  vedang <- c(5,1,3,5) rep(vedang,times=4)   # Equally spaced vector with a defined length seq(from = 1.8, to = 9, length.out = 6)   # ---------------------- #  Logical Vectors  # ----------------------   # Clear console code by: CTRL + L (Windows and Ubuntu) & Option + Command + L (Mac)   #  Scores of a game by two players- P1 and P2 P1 <- c(12,41,78,24,75) P2 <- c(11,46,87,72,96)   # P1 wins when the score of P1 is more than that of P2  P1.win <- P1 > P2 P1.win   # P2 wins when the score of P2 is more than that of P1 P2.win <- P2 > P1 P2.win   # Which games did P1 win? which(P1.win)   # Which games did P2 win? which(P2.win)   # Score of P1 when it won P1[P1.win]   # Score of P2 when it won P2[P2.win]   # Sum of matches won by P1 and P2 sum(P1.win) sum(P2.win)   # Did P2 win any or all of the matches? any(P2.win) all(P2.win)   # ---------------------- # Text and Strings # ----------------------   # Clear console code by: CTRL + L (Windows and Ubuntu) & Option + Command + L (Mac)   # Define a string inside double inverted commas v <- "My name is Vedang"   # Length of the string length(v)   # Number of characters nchar(v) # Please note that space is also a character   # Vector with different strings  v  <- c("My","name","is","Vedang")   # Find the length length(v)   # Number of characters nchar(v)   # ---------------------- # MATRICES  # ----------------------   # Defining a matrix with rowss and columns; by row Vedang <- matrix(c(4, 5, 6, 7, 8, 9), nrow=3,ncol=2,byrow=TRUE) # Print the matrix Vedang   # Defining a matrix with rowss and columns; by column Vedang <- matrix(c(4, 5, 6, 7, 8, 9), nrow=3,ncol=2,byrow=FALSE) # Print the matrix Vedang   # Print the 3rd row and 2nd column  Vedang[3,2]    # Print the 3rd row  Vedang[3,]    # Print the 2nd column Vedang[,2]         # Label the Row and Column dimensions dimnames(Vedang) = list(c("row1", "row2","row3"),c("col1", "col2")) Vedang   # ---------------------- # LISTS  # ----------------------   a=c(1,2,3) b=c("qw","er","ty","ui","op") c=c(TRUE,FALSE,TRUE) vedang=list(n,s,b,77) vedang   # Extract the child entries  vedang[c(2, 4)] `

Basic Statistical Operations

` #-------------------------- # Set Directory  #--------------------------   # Set working directory setwd("C:/Users/vedangvatsa/documents") #Use the directory where you have stored the csv file   # Print working directory print(getwd())   # Read the data file which should be in the set working directory data <- read.csv(file='sales.csv')    #-------------------------- # Basics    #--------------------------   # Summary of the descriptive statistics and frequencies summary(data)    # Data editor edit(data)    # Structure of the dataset str(data)    # List of variables in the dataset names(data)   # Rows of the dataset  head(data)    # First n rows head(data, n=2)   # Last few rows  tail(data)    # Last n rows tail(data, n=2)   # Rows 3 to 5  data[3:5, ]   # Rows 3 to 5 with Columns 2 to 4  data[3:5,2:4]    # Rows 3 to 5 with Columns 2 and 4  data[3:5,c(2,4)]    #-------------------------- # Basic Statistics    #--------------------------   # Mean of a column mean(data\$Units_Shipped_Q1)    # Median of a column median(data\$Units_Shipped)   # Variance of a column var(data\$Units_Shipped)   # Standard Deviation of a column sd(data\$Market_Share_Q1)     # Maximum value from a column max(data\$Units_Shipped_Q1)    # Minimum value from a column min(data\$Units_Shipped_Q1)   # Range- Minimum and Maximum values of a column range(data\$Units_Shipped_Q1)   # Quantile or percentile for a column quantile(data\$Units_Shipped_Q1)   # Number of Observations in a column or length of a column length(data\$Units_Shipped_Q1)   # Value corresponding to the maximum value of a different column   data\$Market_Share_Q1[[which.max(data\$Units_Shipped_Q1)]]   # Value corresponding to the minimum value of a different column   data\$Market_Share_Q1[[which.min(data\$Units_Shipped_Q1)]] `

Charts and Graphs

` #-------------------------- # Pie Chart #-------------------------- # Syntax - pie(x, labels, radius, main, col, clockwise)   # Create data  vedang <- c(10, 20, 30, 50, 70) label <- c("Paris", "London", "New York", "Sydney", "Moscow")   # Plot the pie chart pie(vedang,label)   # Add main text and colors pie(vedang,label, main = "This is the main text", col = rainbow(length(vedang)))   # Add legend and set the scale pie(vedang, label, main = "This is the main text", col = rainbow(length(vedang))) legend("topright", label, cex = 0.7, fill = rainbow(length(vedang)))     #To save the chart as an image in the directory # Give the chart file a name png(file = "pie_chart.png") # Save and close the file. dev.off()   #-------------------------- # 3D Pie Chart #-------------------------- # Install the plotrix package by selecting plotrix from the pop up screen install.packages()   # Get the library. library(plotrix)   # Plot the 3D chart pie3D(vedang, labels = label, explode = 0.5, main = "This is the main text") legend("topright", label, cex = 0.7, fill = rainbow(length(vedang)))     #To save the chart as an image in the directory # Give the chart file a name png(file = "3D_pie_chart.png") # Save and close the file. dev.off()   #-------------------------- # Bar Plot #-------------------------- #Syntax - barplot(H, xlab, ylab, main, names.arg, col)   # Plot the bar chart  barplot(vedang)   # Add features to the bar chart: Names argument/ label, X and Y axis labels, main text, colors and legend barplot(vedang, names.arg= label, ,xlab="Cities", ylab="Score", main = "This is the main text", col = rainbow(length(vedang))) legend("topleft", c("Paris", "London", "New York", "Sydney", "Moscow"), cex = 0.7, fill = rainbow(length(vedang)))     #To save the chart as an image in the directory # Give the chart file a name png(file = "bar_chart.png") # Save and close the file. dev.off()   #-------------------------- # Stacked Bar Chart #--------------------------   # Create data  cities <- c("Paris", "London", "New York", "Sydney", "Moscow") sport <- c("Soccer", "Rugby", "Baseball")   # Create the matrix of the values. values <- matrix(c(3,7,1,8,11,7,2,4,13,10,15,2,9,9,4), nrow = 3, ncol = 5, byrow = TRUE)   # Plot the Stacked bar chart barplot(values)   # Add features to the bar chart barplot(values, main = "Score value", names.arg = cities, xlab = "cities", ylab = "Score", col= rainbow(length(vedang))) legend("topleft", sport, cex = 0.7, fill = rainbow(length(vedang)))     #To save the chart as an image in the directory # Give the chart file a name png(file = "Stacked_bar_chart.png") # Save and close the file. dev.off()   #-------------------------- # Histogram #-------------------------- Syntax - hist(v, main, xlab, xlim, ylim, breaks, col, border)   # Create data vedang <-  c(4,2,7,5,9,2,6,8,9,2,9)   # Plot Histogram hist(vedang)   # Add features to the Histogram hist(vedang, main = "Scores", xlab = "Weight", ylab = "Frequency", col= "red")   # Specify range of values of x and y axis by lim; width of bars by break hist(vedang, xlab = "Weight", ylab = "Frequency", col = "red", border = "yellow", xlim = c(0,12), ylim = c(0,10), breaks = 2)     #To save the chart as an image in the directory # Give the chart file a name png(file = "histogram.png") # Save and close the file. dev.off()   #-------------------------- # Line plot #-------------------------- #Syntax - plot(v, type, col, xlab, ylab)   # Create darta vedang <- c(7, 14, 9, 20, 26, 19, 31) plot(vedang, type = "o", col = "red", xlab = "Player", ylab = "Score")   # type "p" - only points # type "l" - only lines # type "o" - both points and lines   #For another line line2 <- c(13, 9, 17, 22, 10)            lines(line2, type = "o", col = "blue")     #To save the chart as an image in the directory # Give the chart file a name png(file = "LineChart.png") # Save and close the file. dev.off()   #-------------------------- # Scatter plot #-------------------------- #Syntax - plot(x, y, main, xlab, ylab, xlim, ylim, axes)   a <- c(2,6,4,5,7,9,15,16,17,3,14) b <- c(6,13,3,6,8,9,5,4,3,14,17) plot(a,b)     #To save the chart as an image in the directory # Give the chart file a name png(file = "LineChart.png") # Save and close the file. dev.off() `

