Zero to R in 30 minutes

What is R?

R is a free (open source) programming language that can beused for statistical computations and data visualization. It is cross-platformcompatible which means that it can be used on Windows, MacOS, Linux, etc.

Why is it used?

In simple words, R can help you to tinker with data and figure some useful inferences out of that dataset, which can further be used for making useful decisions.

Data Manipulation: To shape the dataset into the required format.

Data Analysis: Over 4000 packages are available for implementation of statistical analysis like hypothesis testing, model fitting, clustering techniques, and machine learning.

Data Visualisation: Animated and interactive graphs can be created using R.

How to use R?

Download and install RGui from https://cran.r-project.org

It will look like this:

A more recommended version for starters would be an IDE (Integrated Development Environment) like RStudio, which may be downloaded from RStudio.

To understand what an IDE is, think of MS Word and Notepad. Both perform the same task of writing, but the former offers an easy interface coupled with several useful tools.

What statistical operations can it perform?

R can compute a huge variety of statistical operations including:

Mean, Median, Mode, Standard Deviation, Variance, etc.
Regression
ANOVA
Binomial Distribution
Chi-square test
Analysis of covariance
Random Forest
Survival Analysis

What type of graphs can it produce?

Different types of graphs can be created using R, including line-plot, box-plot, histogram, density curve, scatter plot, bar plot, etc.

How to code in R?

A try-on tutorial has been provided with guided comments, in two parts:
1. Basic Operations
2. Basic Statistics
3. Charts and graphs

Basic Operations

 # ----------------------
 # BASIC OPERATIONS IN R
 # ----------------------
  
 # Clear console code by: CTRL + L (Windows and Ubuntu) & Option + Command + L (Mac)
  
 # Anything written after the hash mark is a comment
  
 # Simple math
  
 # Assign values
 x <- 10; y <- 3;
  
 # Add the rassigned values
 x + y
  
 # Subtract the assigned values
 x-y 
  
 # Multiplication
 x*y
  
 # Division
 x/y
  
 # Exponentiation
 x^y
 x**y
  
 # Remainder or Modulus
 x%%y
  
 # Integer division, gives out integer value after division
 x%/%y
  
 # Log and exponentials 
 vedang <- (1:10)
 vedang
  
 # Natural logarithm 
 log(vedang)
  
 # The above can also be assigned to a variable-v, which can be printed:
 v <- log(vedang)
 v
  
 # exponential 
 exp (vedang)
  
 # Logarithm with base setting 
 log(vedang, base = 10)
  
 # Square root 
 sqrt(9)
  
 # Factorial
 factorial(5)
  
 # Combination- nCr
 n = 6 ; r  = 2
 vedang = choose(n,r)
 vedang
 ved = choose(n,n-r)
 ved
 # (both values come out to be same, because nCr = nCn-r)
  
 # Rounding off
 v = 326.358
  
 # Rounding off the decimal places
 round(v,digits = 2)
  
 # Floor value
 floor(v)
  
 # Ceiling value
 ceiling(v)
  
 # Truncation 
 trunc(v)
  
 # ----------------------
 # VECTORS 
 # ----------------------
  
 # Clear console code by: CTRL + L (Windows and Ubuntu) & Option + Command + L (Mac)
  
 # Defining a Vector 
 Vector1 <- c(10,2,1,5,7,4)
 Vector1
  
 # Vector with an incrementing sequence of numbers
 Vector2 <- seq(from = 2.5, to = 5.0, by = 0.5)
 Vector2
  
 # Linear operation 
 10* Vector1 + 2*Vector2 
  
 # Combination 
 c(Vector1,Vector2)
  
 # Repeatition 
 vedang <- c(5,1,3,5)
 rep(vedang,times=4)
  
 # Equally spaced vector with a defined length
 seq(from = 1.8, to = 9, length.out = 6)
  
 # ----------------------
 #  Logical Vectors 
 # ----------------------
  
 # Clear console code by: CTRL + L (Windows and Ubuntu) & Option + Command + L (Mac)
  
 #  Scores of a game by two players- P1 and P2
 P1 <- c(12,41,78,24,75)
 P2 <- c(11,46,87,72,96)
  
 # P1 wins when the score of P1 is more than that of P2 
 P1.win <- P1 > P2
 P1.win
  
 # P2 wins when the score of P2 is more than that of P1
 P2.win <- P2 > P1
 P2.win
  
 # Which games did P1 win?
 which(P1.win)
  
 # Which games did P2 win?
 which(P2.win)
  
 # Score of P1 when it won
 P1[P1.win]
  
 # Score of P2 when it won
 P2[P2.win]
  
 # Sum of matches won by P1 and P2
 sum(P1.win)
 sum(P2.win)
  
 # Did P2 win any or all of the matches?
 any(P2.win)
 all(P2.win)
  
 # ----------------------
 # Text and Strings
 # ----------------------
  
 # Clear console code by: CTRL + L (Windows and Ubuntu) & Option + Command + L (Mac)
  
 # Define a string inside double inverted commas
 v <- "My name is Vedang"
  
 # Length of the string
 length(v)
  
 # Number of characters
 nchar(v)
 # Please note that space is also a character
  
 # Vector with different strings 
 v  <- c("My","name","is","Vedang")
  
 # Find the length
 length(v)
  
 # Number of characters
 nchar(v)
  
 # ----------------------
 # MATRICES 
 # ----------------------
  
 # Defining a matrix with rowss and columns; by row
 Vedang <- matrix(c(4, 5, 6, 7, 8, 9), nrow=3,ncol=2,byrow=TRUE)
 # Print the matrix
 Vedang
  
 # Defining a matrix with rowss and columns; by column
 Vedang <- matrix(c(4, 5, 6, 7, 8, 9), nrow=3,ncol=2,byrow=FALSE)
 # Print the matrix
 Vedang
  
 # Print the 3rd row and 2nd column 
 Vedang[3,2] 
  
 # Print the 3rd row 
 Vedang[3,] 
  
 # Print the 2nd column
 Vedang[,2]      
  
 # Label the Row and Column dimensions
 dimnames(Vedang) = list(c("row1", "row2","row3"),c("col1", "col2"))
 Vedang
  
 # ----------------------
 # LISTS 
 # ----------------------
  
 a=c(1,2,3)
 b=c("qw","er","ty","ui","op")
 c=c(TRUE,FALSE,TRUE)
 vedang=list(n,s,b,77)
 vedang
  
 # Extract the child entries 
 vedang[c(2, 4)]

Basic Statistical Operations
Download the required csv file from below

sales.csv Download

 #--------------------------
 # Set Directory 
 #--------------------------
  
 # Set working directory
 setwd("C:/Users/vedangvatsa/documents")
 #Use the directory where you have stored the csv file
  
 # Print working directory
 print(getwd())
  
 # Read the data file which should be in the set working directory
 data <- read.csv(file='sales.csv')
  
 
 #--------------------------
 # Basics   
 #--------------------------
  
 # Summary of the descriptive statistics and frequencies
 summary(data) 
  
 # Data editor
 edit(data) 
  
 # Structure of the dataset
 str(data) 
  
 # List of variables in the dataset
 names(data)
  
 # Rows of the dataset 
 head(data) 
  
 # First n rows
 head(data, n=2)
  
 # Last few rows 
 tail(data) 
  
 # Last n rows
 tail(data, n=2)
  
 # Rows 3 to 5 
 data[3:5, ]
  
 # Rows 3 to 5 with Columns 2 to 4 
 data[3:5,2:4] 
  
 # Rows 3 to 5 with Columns 2 and 4 
 data[3:5,c(2,4)] 
  
 #--------------------------
 # Basic Statistics   
 #--------------------------
  
 # Mean of a column
 mean(data$Units_Shipped_Q1) 
  
 # Median of a column
 median(data$Units_Shipped)
  
 # Variance of a column
 var(data$Units_Shipped)
  
 # Standard Deviation of a column
 sd(data$Market_Share_Q1)
    
 # Maximum value from a column
 max(data$Units_Shipped_Q1) 
  
 # Minimum value from a column
 min(data$Units_Shipped_Q1)
  
 # Range- Minimum and Maximum values of a column
 range(data$Units_Shipped_Q1)
  
 # Quantile or percentile for a column
 quantile(data$Units_Shipped_Q1)
  
 # Number of Observations in a column or length of a column
 length(data$Units_Shipped_Q1)
  
 # Value corresponding to the maximum value of a different column
   data$Market_Share_Q1[[which.max(data$Units_Shipped_Q1)]]
  
 # Value corresponding to the minimum value of a different column
   data$Market_Share_Q1[[which.min(data$Units_Shipped_Q1)]]

Charts and Graphs

 #--------------------------
 # Pie Chart
 #--------------------------
 # Syntax - pie(x, labels, radius, main, col, clockwise)
  
 # Create data 
 vedang <- c(10, 20, 30, 50, 70)
 label <- c("Paris", "London", "New York", "Sydney", "Moscow")
  
 # Plot the pie chart
 pie(vedang,label)
  
 # Add main text and colors
 pie(vedang,label, main = "This is the main text", col = rainbow(length(vedang)))
  
 # Add legend and set the scale
 pie(vedang, label, main = "This is the main text", col = rainbow(length(vedang)))
 legend("topright", label, cex = 0.7, fill = rainbow(length(vedang)))
  
  
 #To save the chart as an image in the directory
 # Give the chart file a name
 png(file = "pie_chart.png")
 # Save and close the file.
 dev.off()
  
 #--------------------------
 # 3D Pie Chart
 #--------------------------
 # Install the plotrix package by selecting plotrix from the pop up screen
 install.packages()
  
 # Get the library.
 library(plotrix)
  
 # Plot the 3D chart
 pie3D(vedang, labels = label, explode = 0.5, main = "This is the main text")
 legend("topright", label, cex = 0.7, fill = rainbow(length(vedang)))
  
  
 #To save the chart as an image in the directory
 # Give the chart file a name
 png(file = "3D_pie_chart.png")
 # Save and close the file.
 dev.off()
  
 #--------------------------
 # Bar Plot
 #--------------------------
 #Syntax - barplot(H, xlab, ylab, main, names.arg, col)
  
 # Plot the bar chart 
 barplot(vedang)
  
 # Add features to the bar chart: Names argument/ label, X and Y axis labels, main text, colors and legend
 barplot(vedang, names.arg= label, ,xlab="Cities", ylab="Score", main = "This is the main text", col = rainbow(length(vedang)))
 legend("topleft", c("Paris", "London", "New York", "Sydney", "Moscow"), cex = 0.7, fill = rainbow(length(vedang)))
  
  
 #To save the chart as an image in the directory
 # Give the chart file a name
 png(file = "bar_chart.png")
 # Save and close the file.
 dev.off()
  
 #--------------------------
 # Stacked Bar Chart
 #--------------------------
  
 # Create data 
 cities <- c("Paris", "London", "New York", "Sydney", "Moscow")
 sport <- c("Soccer", "Rugby", "Baseball")
  
 # Create the matrix of the values.
 values <- matrix(c(3,7,1,8,11,7,2,4,13,10,15,2,9,9,4), nrow = 3, ncol = 5, byrow = TRUE)
  
 # Plot the Stacked bar chart
 barplot(values)
  
 # Add features to the bar chart
 barplot(values, main = "Score value", names.arg = cities, xlab = "cities", ylab = "Score", col= rainbow(length(vedang)))
 legend("topleft", sport, cex = 0.7, fill = rainbow(length(vedang)))
  
  
 #To save the chart as an image in the directory
 # Give the chart file a name
 png(file = "Stacked_bar_chart.png")
 # Save and close the file.
 dev.off()
  
 #--------------------------
 # Histogram
 #--------------------------
 Syntax - hist(v, main, xlab, xlim, ylim, breaks, col, border)
  
 # Create data
 vedang <-  c(4,2,7,5,9,2,6,8,9,2,9)
  
 # Plot Histogram
 hist(vedang)
  
 # Add features to the Histogram
 hist(vedang, main = "Scores", xlab = "Weight", ylab = "Frequency", col= "red")
  
 # Specify range of values of x and y axis by lim; width of bars by break
 hist(vedang, xlab = "Weight", ylab = "Frequency", col = "red", border = "yellow", xlim = c(0,12), ylim = c(0,10), breaks = 2)
  
  
 #To save the chart as an image in the directory
 # Give the chart file a name
 png(file = "histogram.png")
 # Save and close the file.
 dev.off()
  
 #--------------------------
 # Line plot
 #--------------------------
 #Syntax - plot(v, type, col, xlab, ylab)
  
 # Create darta
 vedang <- c(7, 14, 9, 20, 26, 19, 31)
 plot(vedang, type = "o", col = "red", xlab = "Player", ylab = "Score")
  
 # type "p" - only points
 # type "l" - only lines
 # type "o" - both points and lines
  
 #For another line
 line2 <- c(13, 9, 17, 22, 10)           
 lines(line2, type = "o", col = "blue")
  
  
 #To save the chart as an image in the directory
 # Give the chart file a name
 png(file = "LineChart.png")
 # Save and close the file.
 dev.off()
  
 #--------------------------
 # Scatter plot
 #--------------------------
 #Syntax - plot(x, y, main, xlab, ylab, xlim, ylim, axes)
  
 a <- c(2,6,4,5,7,9,15,16,17,3,14)
 b <- c(6,13,3,6,8,9,5,4,3,14,17)
 plot(a,b)
  
  
 #To save the chart as an image in the directory
 # Give the chart file a name
 png(file = "LineChart.png")
 # Save and close the file.
 dev.off()

Next RFID in Postal and Courier Services »

Previous « How can India benefit from the current shifts in demand and economic parameters?

Verifiable Delay Functions (VDFs)
VDFs are like timers that help keep blockchain systems secure by making certain processes take…
Decentralized web
The quickest option for centralized enterprises to provide reliable infrastructure to fuel Web3’s dApp ecosystems…
ZK fixes the broken privacy layer of the internet
Zero-knowledge proofs are a type of cryptographic protocol that allows one party (the prover) to…
Web3 can only be as decentralized as its supporting infrastructure
Web3 envisions the hardware and software of the internet transferring from the corporate campuses of…
Decoding Buterin’s concept of Soul-Bound Tokens
Buterin's tweet validates pseudonymous culture at a time when the industry's rapid growth has driven…
Worth of an NFT
NFTs have received a lot of interest in the area of art and entertainment. However,…

Zero to R in 30 minutes

Related Post