Zero to R in 30 minutes

Source: Coursera

What is R?

R is a free (open source) programming language that can be used for statistical computations and data visualization. It is cross-platform compatible which means that it can be used on Windows, MacOS, Linux, etc.

Why is it used?

In simple words, R can help you to tinker with data and figure some useful inferences out of that dataset, which can further be used for making useful decisions.

Data Manipulation: To shape the dataset into the required format.

Data Analysis: Over 4000 packages are available for implementation of statistical analysis like hypothesis testing, model fitting, clustering techniques, and machine learning.

Data Visualisation: Animated and interactive graphs can be created using R.

How to use R?

Download and install RGui from https://cran.r-project.org

It will look like this:

A more recommended version for starters would be an IDE (Integrated Development Environment) like RStudio, which may be downloaded from RStudio.

To understand what an IDE is, think of MS Word and Notepad. Both perform the same task of writing, but the former offers an easy interface coupled with several useful tools.

What statistical operations can it perform?

R can compute a huge variety of statistical operations including:

  • Mean, Median, Mode, Standard Deviation, Variance, etc.
  • Regression
  • ANOVA
  • Binomial Distribution
  • Chi-square test
  • Analysis of covariance
  • Random Forest
  • Survival Analysis

What type of graphs can it produce?

Different types of graphs can be created using R, including line-plot, box-plot, histogram, density curve, scatter plot, bar plot, etc.

Source: RStudio

How to code in R?

A try-on tutorial has been provided with guided comments, in two parts:
1. Basic Operations
2. Basic Statistics
3. Charts and graphs

Basic Operations

 # ----------------------
# BASIC OPERATIONS IN R
# ----------------------
 
# Clear console code by: CTRL + L (Windows and Ubuntu) & Option + Command + L (Mac)
 
# Anything written after the hash mark is a comment
 
# Simple math
 
# Assign values
x <- 10; y <- 3;
 
# Add the rassigned values
x + y
 
# Subtract the assigned values
x-y
 
# Multiplication
x*y
 
# Division
x/y
 
# Exponentiation
x^y
x**y
 
# Remainder or Modulus
x%%y
 
# Integer division, gives out integer value after division
x%/%y
 
# Log and exponentials
vedang <- (1:10)
vedang
 
# Natural logarithm
log(vedang)
 
# The above can also be assigned to a variable-v, which can be printed:
v <- log(vedang)
v
 
# exponential
exp (vedang)
 
# Logarithm with base setting
log(vedang, base = 10)
 
# Square root
sqrt(9)
 
# Factorial
factorial(5)
 
# Combination- nCr
n = 6 ; r  = 2
vedang = choose(n,r)
vedang
ved = choose(n,n-r)
ved
# (both values come out to be same, because nCr = nCn-r)
 
# Rounding off
v = 326.358
 
# Rounding off the decimal places
round(v,digits = 2)
 
# Floor value
floor(v)
 
# Ceiling value
ceiling(v)
 
# Truncation
trunc(v)
 
# ----------------------
# VECTORS
# ----------------------
 
# Clear console code by: CTRL + L (Windows and Ubuntu) & Option + Command + L (Mac)
 
# Defining a Vector
Vector1 <- c(10,2,1,5,7,4)
Vector1
 
# Vector with an incrementing sequence of numbers
Vector2 <- seq(from = 2.5, to = 5.0, by = 0.5)
Vector2
 
# Linear operation
10* Vector1 + 2*Vector2
 
# Combination
c(Vector1,Vector2)
 
# Repeatition
vedang <- c(5,1,3,5)
rep(vedang,times=4)
 
# Equally spaced vector with a defined length
seq(from = 1.8, to = 9, length.out = 6)
 
# ----------------------
#  Logical Vectors
# ----------------------
 
# Clear console code by: CTRL + L (Windows and Ubuntu) & Option + Command + L (Mac)
 
#  Scores of a game by two players- P1 and P2
P1 <- c(12,41,78,24,75)
P2 <- c(11,46,87,72,96)
 
# P1 wins when the score of P1 is more than that of P2
P1.win <- P1 > P2
P1.win
 
# P2 wins when the score of P2 is more than that of P1
P2.win <- P2 > P1
P2.win
 
# Which games did P1 win?
which(P1.win)
 
# Which games did P2 win?
which(P2.win)
 
# Score of P1 when it won
P1[P1.win]
 
# Score of P2 when it won
P2[P2.win]
 
# Sum of matches won by P1 and P2
sum(P1.win)
sum(P2.win)
 
# Did P2 win any or all of the matches?
any(P2.win)
all(P2.win)
 
# ----------------------
# Text and Strings
# ----------------------
 
# Clear console code by: CTRL + L (Windows and Ubuntu) & Option + Command + L (Mac)
 
# Define a string inside double inverted commas
v <- "My name is Vedang"
 
# Length of the string
length(v)
 
# Number of characters
nchar(v)
# Please note that space is also a character
 
# Vector with different strings
v  <- c("My","name","is","Vedang")
 
# Find the length
length(v)
 
# Number of characters
nchar(v)
 
# ----------------------
# MATRICES
# ----------------------
 
# Defining a matrix with rowss and columns; by row
Vedang <- matrix(c(4, 5, 6, 7, 8, 9), nrow=3,ncol=2,byrow=TRUE)
# Print the matrix
Vedang
 
# Defining a matrix with rowss and columns; by column
Vedang <- matrix(c(4, 5, 6, 7, 8, 9), nrow=3,ncol=2,byrow=FALSE)
# Print the matrix
Vedang
 
# Print the 3rd row and 2nd column
Vedang[3,2]
 
# Print the 3rd row
Vedang[3,]
 
# Print the 2nd column
Vedang[,2]     
 
# Label the Row and Column dimensions
dimnames(Vedang) = list(c("row1", "row2","row3"),c("col1", "col2"))
Vedang
 
# ----------------------
# LISTS
# ----------------------
 
a=c(1,2,3)
b=c("qw","er","ty","ui","op")
c=c(TRUE,FALSE,TRUE)
vedang=list(n,s,b,77)
vedang
 
# Extract the child entries
vedang[c(2, 4)]

Basic Statistical Operations
Download the required csv file from below

 #--------------------------
# Set Directory
#--------------------------
 
# Set working directory
setwd("C:/Users/vedangvatsa/documents")
#Use the directory where you have stored the csv file
 
# Print working directory
print(getwd())
 
# Read the data file which should be in the set working directory
data <- read.csv(file='sales.csv')
 

#--------------------------
# Basics  
#--------------------------
 
# Summary of the descriptive statistics and frequencies
summary(data)
 
# Data editor
edit(data)
 
# Structure of the dataset
str(data)
 
# List of variables in the dataset
names(data)
 
# Rows of the dataset
head(data)
 
# First n rows
head(data, n=2)
 
# Last few rows
tail(data)
 
# Last n rows
tail(data, n=2)
 
# Rows 3 to 5
data[3:5, ]
 
# Rows 3 to 5 with Columns 2 to 4
data[3:5,2:4]
 
# Rows 3 to 5 with Columns 2 and 4
data[3:5,c(2,4)]
 
#--------------------------
# Basic Statistics  
#--------------------------
 
# Mean of a column
mean(data$Units_Shipped_Q1)
 
# Median of a column
median(data$Units_Shipped)
 
# Variance of a column
var(data$Units_Shipped)
 
# Standard Deviation of a column
sd(data$Market_Share_Q1)
  
# Maximum value from a column
max(data$Units_Shipped_Q1)
 
# Minimum value from a column
min(data$Units_Shipped_Q1)
 
# Range- Minimum and Maximum values of a column
range(data$Units_Shipped_Q1)
 
# Quantile or percentile for a column
quantile(data$Units_Shipped_Q1)
 
# Number of Observations in a column or length of a column
length(data$Units_Shipped_Q1)
 
# Value corresponding to the maximum value of a different column
  data$Market_Share_Q1[[which.max(data$Units_Shipped_Q1)]]
 
# Value corresponding to the minimum value of a different column
  data$Market_Share_Q1[[which.min(data$Units_Shipped_Q1)]]

Charts and Graphs

 #--------------------------
# Pie Chart
#--------------------------
# Syntax - pie(x, labels, radius, main, col, clockwise)
 
# Create data
vedang <- c(10, 20, 30, 50, 70)
label <- c("Paris", "London", "New York", "Sydney", "Moscow")
 
# Plot the pie chart
pie(vedang,label)
 
# Add main text and colors
pie(vedang,label, main = "This is the main text", col = rainbow(length(vedang)))
 
# Add legend and set the scale
pie(vedang, label, main = "This is the main text", col = rainbow(length(vedang)))
legend("topright", label, cex = 0.7, fill = rainbow(length(vedang)))
 
 
#To save the chart as an image in the directory
# Give the chart file a name
png(file = "pie_chart.png")
# Save and close the file.
dev.off()
 
#--------------------------
# 3D Pie Chart
#--------------------------
# Install the plotrix package by selecting plotrix from the pop up screen
install.packages()
 
# Get the library.
library(plotrix)
 
# Plot the 3D chart
pie3D(vedang, labels = label, explode = 0.5, main = "This is the main text")
legend("topright", label, cex = 0.7, fill = rainbow(length(vedang)))
 
 
#To save the chart as an image in the directory
# Give the chart file a name
png(file = "3D_pie_chart.png")
# Save and close the file.
dev.off()
 
#--------------------------
# Bar Plot
#--------------------------
#Syntax - barplot(H, xlab, ylab, main, names.arg, col)
 
# Plot the bar chart
barplot(vedang)
 
# Add features to the bar chart: Names argument/ label, X and Y axis labels, main text, colors and legend
barplot(vedang, names.arg= label, ,xlab="Cities", ylab="Score", main = "This is the main text", col = rainbow(length(vedang)))
legend("topleft", c("Paris", "London", "New York", "Sydney", "Moscow"), cex = 0.7, fill = rainbow(length(vedang)))
 
 
#To save the chart as an image in the directory
# Give the chart file a name
png(file = "bar_chart.png")
# Save and close the file.
dev.off()
 
#--------------------------
# Stacked Bar Chart
#--------------------------
 
# Create data
cities <- c("Paris", "London", "New York", "Sydney", "Moscow")
sport <- c("Soccer", "Rugby", "Baseball")
 
# Create the matrix of the values.
values <- matrix(c(3,7,1,8,11,7,2,4,13,10,15,2,9,9,4), nrow = 3, ncol = 5, byrow = TRUE)
 
# Plot the Stacked bar chart
barplot(values)
 
# Add features to the bar chart
barplot(values, main = "Score value", names.arg = cities, xlab = "cities", ylab = "Score", col= rainbow(length(vedang)))
legend("topleft", sport, cex = 0.7, fill = rainbow(length(vedang)))
 
 
#To save the chart as an image in the directory
# Give the chart file a name
png(file = "Stacked_bar_chart.png")
# Save and close the file.
dev.off()
 
#--------------------------
# Histogram
#--------------------------
Syntax - hist(v, main, xlab, xlim, ylim, breaks, col, border)
 
# Create data
vedang <-  c(4,2,7,5,9,2,6,8,9,2,9)
 
# Plot Histogram
hist(vedang)
 
# Add features to the Histogram
hist(vedang, main = "Scores", xlab = "Weight", ylab = "Frequency", col= "red")
 
# Specify range of values of x and y axis by lim; width of bars by break
hist(vedang, xlab = "Weight", ylab = "Frequency", col = "red", border = "yellow", xlim = c(0,12), ylim = c(0,10), breaks = 2)
 
 
#To save the chart as an image in the directory
# Give the chart file a name
png(file = "histogram.png")
# Save and close the file.
dev.off()
 
#--------------------------
# Line plot
#--------------------------
#Syntax - plot(v, type, col, xlab, ylab)
 
# Create darta
vedang <- c(7, 14, 9, 20, 26, 19, 31)
plot(vedang, type = "o", col = "red", xlab = "Player", ylab = "Score")
 
# type "p" - only points
# type "l" - only lines
# type "o" - both points and lines
 
#For another line
line2 <- c(13, 9, 17, 22, 10)          
lines(line2, type = "o", col = "blue")
 
 
#To save the chart as an image in the directory
# Give the chart file a name
png(file = "LineChart.png")
# Save and close the file.
dev.off()
 
#--------------------------
# Scatter plot
#--------------------------
#Syntax - plot(x, y, main, xlab, ylab, xlim, ylim, axes)
 
a <- c(2,6,4,5,7,9,15,16,17,3,14)
b <- c(6,13,3,6,8,9,5,4,3,14,17)
plot(a,b)
 
 
#To save the chart as an image in the directory
# Give the chart file a name
png(file = "LineChart.png")
# Save and close the file.
dev.off()

Recent Posts

  • Blog

The Machine Justice: Rise of China’s Internet Courts

Within one year from its establishment (from August 2017 to August 2018), the Court accepted… Read More

4 hours ago
  • Blog

India’s Edtech Opportunity

The heart of EdTech surge is the US, with 43% of the world’s EdTech enterprises… Read More

1 week ago
  • Blog

Voice and vernacular language inclusion

In India, 10 Million new internet users were added every month in the year 2018.… Read More

3 weeks ago