Plotting Graphs with R

One of best features of the R Analytical Platform is the incredible way we can build publication ready graphs. Many other statistical platforms are good at data crunching, but their graphical abilities are so poor, a business analyst is forced to export data into Microsoft Excel or a spreadsheet program, manipulate and create graph there, before finally pasting it in PowerPoint or presentation.

Fortunately with R, you can not only create very good Graphs, you can create them very easily without cluttering yourself with a lot of syntax.

For example the simple syntax plot, can build a graph for entire dataset.

> data(AirPassengers)

> str(AirPassengers)

Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 …

> plot(AirPassengers)

#This is in the form of a time series plot.

 

#Let us see plotting for multiple variables

> data(iris)

> plot(iris)

#This plots all the variables of the data set iris with each other.

 

Suppose we want to plot only a certain variable within a dataset. The nomenclature in R is quite efficient in terms of space. Just like SAS language uses library name. dataset name and then var statement to refer to certain variables, R just uses the $ sign – like dataset$variablename.

This is because R is very flexible due to it’s Object Oriented Nature.

 

Pie Charts

When it comes to evaluating market share at a particular instance, a pie chart is simple to understand. At the most two pie charts are needed for comparing two different snapshots, but three or more pie charts on same data at different points of time is definitely a bad usage of pie charts.

In R you can create pie-chart, by just using pie(dataset$variable)

Bar Plots  and Histograms

Histograms are the not the same as bar charts, they are simply bar charts of frequencies.

Basically a bar chart shows rectangular bars with length proportional to the quantities being described. It helps to see relative quantities between various category types.

For creating a histogram the syntax is -hist(). Guess what the syntax for Bar Plot is -barplot(datasetname)

Similarly for creating a box plot – the syntax is boxplot().

Line Graphs-

Line chart is one of the most commonly used charts in business analytics and metrics reporting. It basically consists of two variables plotted along the axes with the adjacent points being joined by line segments. Most often used with time series on the x-axis, line charts are simple to understand and use.
Variations on the line graph can include fan charts in time series which include joining line chart of historic data with ranges of future projections. Another common variation is to plot the linear regression or trend line between the two variables  and superimpose it on the graph.
The slope of the line chart shows the rate of change at that particular point , and can also be used to highlight areas of discontinuity or irregular change between two variables.

The basic syntax of line graph is created by first using Plot() function to plot the points and then lines () function to plot the lines between the points.

> plot(cars)
> lines(cars,type=”o”, pch=20, lty=2, col=”green”)

Note we have introduced the parameter col here which stands for colors. One of the most brilliant things about R is the sheer richness of color schemes that is offered.  If you find it confusing to choose which colors look aesthetic, R has something known as color palettes. An additional article for using multiple colors in the same graph without trying to customize them yourself is available at http://www.decisionstats.com/using-color-palettes-in-r/

For reading more about Graphs in R , please also see   http://www.statmethods.net/graphs/index.html

 

Interested in learning about other Analytics and Big Data tools and techniques? Click on our course links and explore more.
Jigsaw’s Data Science with SAS Course – click here.
Jigsaw’s Data Science with R Course – click here.
Jigsaw’s Big Data Course – click here.

Related Articles

} }
Request Callback