In an effort to reduce the amount of time that I spend searching the Internet for basic ggplot2 questions, I’m writing a brief overview of the very basics on the grammar of graphics as a reference that I can come back to for a quick refresher.1
First things first: The RStudio ggplot2 Cheat Sheet most likely has everything you need to know, and the Cookbook for R provides solutions to common problems. To be honest, you might just be here for these links.
One last thing before we begin – make sure that you have installed, updated, and loaded the tidyverse package.
Now, let’s get started.
The General Structure of Graphs
Put your data in this parameter to creates a coordinate system and define the dataset.
This adds a layer of geometric shapes (points, bars, lines) that represent the dataset. Some examples are:
GEOM_POINTcreates a scatterplot
GEOM_BARcreates a bar graph
GEOM_SMOOTHcreates a smooth line
Mappings define how your variables are mapped to visual properties on the graph. They can be defined locally (inside of your geom_function) or globally (inside of the parent ggplot function). They are always defined by aesthetic properties, such as:
x: the variable to map on the x-axis
y: the variable to map on the y-axis
color/fill: the color of the data on the graph
alpha: the transparency of data on the graph (from 0, transparent to 1, opaque)
shape: the shape (numbers #1-20) of points on the graph
size: the size of points on the graphs (in mm)
stroke: the size of shape borders (in mm)
linetype: type of line to display on the graph
Note: you can map additional variables to color, alpha, etc. in addition to x and y, although whether this is actually a good idea depends on your data.
Each type of geometric function has a different set of available mappings, which can be found in the help documentation (i.e. by typing
?geom_point). See the end of this post for quick mapping references.
Stat, or statistical transformations, are used to transform the data before graphing it. Each geometric function has a default statistical transformation – the most common example is bar graphs computing and displaying a count of a variable in the data.
You may need to define a stat in these cases:
- to override the default stat of a geometric function. For example, using
stat = "identity"for
geom_barif you already have a frequency variable in the data.
- to override the default mapping from transformed variables to aesthetics. For example, using
geom_barto display a proportion.
- as an alternative to
geom_functionto build a layer for your graph (see the ggplot2 cheat sheet)
Position is used mainly for bar charts to help with displaying data. When you use color or fill to map a third variable in your data to different colors, there are a number of ways to position the additional information on your graph. The options include:
- by default, the bar chart will stack the bars
- identity: creates overlapping bars (not that useful, but if you’re doing it then use fill = NA)
- dodge: places bars next to one another (the most useful, in my opinion)
- fill: makes all of the bars the same height (if you don’t care about the y-variable)
geom_jitter() is a useful position adjustment for scatter plots to solve the problem of overplotting (where you have a lot of overlapping dots that aren’t visible).
Most likely, you won’t be using this argument because the default Cartesian coordinate system will satisfy your needs. However, here are some common uses:
coord_flip()switches the x and y axes
coord_fixed()lets you define the ratio between your x and y axes (default: 1)
Facets are subplots that are useful for visually separating your data by discrete variables. You can create facets in two main ways:
facet_wrap()splits the plot by a single discrete variable
facet_grid()splits the plot by a combination of two variables separated by
Titles, Labels, and Axes
Even though these aspects are not a part of the basic structure of graphs, they are one of the most important. Nobody cares how great your graph looks if they don’t know what it’s meant to show.
The basics are best shown through example:
…and we’re finished! Not too bad, right?
I’ve included some useful references and example code below that illustrates the concepts of this post in practice.
Basic Examples in R code
I recommend copying and pasting this code into RStudio for ease of use.
This post is meant for a person who has used ggplot2 in the past and is looking for a brief summary of the basics. The content in this post is based on chapter three of R for Data Science by Hadley Wickham & Garrett Grolemund, which I would highly recommend reading in full if you have never used ggplot2 before. ⤴