In an effort to reduce the amount of time that I spend searching the Internet for basic ggplot2 questions, I’m writing a brief overview of the very basics on the grammar of graphics as a reference that I can come back to for a quick refresher.^{1}
First things first: The RStudio ggplot2 Cheat Sheet most likely has everything you need to know, and the Cookbook for R provides solutions to common problems. To be honest, you might just be here for these links.
One last thing before we begin – make sure that you have installed, updated, and loaded the tidyverse package.
Now, let’s get started.
The General Structure of Graphs
Data
Put your data in this parameter to creates a coordinate system and define the dataset.
Geom_Function
This adds a layer of geometric shapes (points, bars, lines) that represent the dataset. Some examples are:
GEOM_POINT
creates a scatterplotGEOM_BAR
creates a bar graphGEOM_SMOOTH
creates a smooth line
Mappings
Mappings define how your variables are mapped to visual properties on the graph. They can be defined locally (inside of your geom_function) or globally (inside of the parent ggplot function). They are always defined by aesthetic properties, such as:
x
: the variable to map on the xaxisy
: the variable to map on the yaxiscolor/fill
: the color of the data on the graphalpha
: the transparency of data on the graph (from 0, transparent to 1, opaque)shape
: the shape (numbers #120) of points on the graphsize
: the size of points on the graphs (in mm)stroke
: the size of shape borders (in mm)linetype
: type of line to display on the graph
Note: you can map additional variables to color, alpha, etc. in addition to x and y, although whether this is actually a good idea depends on your data.
Each type of geometric function has a different set of available mappings, which can be found in the help documentation (i.e. by typing ?geom_point
). See the end of this post for quick mapping references.
Stat
Stat, or statistical transformations, are used to transform the data before graphing it. Each geometric function has a default statistical transformation – the most common example is bar graphs computing and displaying a count of a variable in the data.
You may need to define a stat in these cases:
 to override the default stat of a geometric function. For example, using
stat = "identity"
forgeom_bar
if you already have a frequency variable in the data.  to override the default mapping from transformed variables to aesthetics. For example, using
geom_bar
to display a proportion.  as an alternative to
geom_function
to build a layer for your graph (see the ggplot2 cheat sheet)
Position
Position is used mainly for bar charts to help with displaying data. When you use color or fill to map a third variable in your data to different colors, there are a number of ways to position the additional information on your graph. The options include:
 by default, the bar chart will stack the bars
 identity: creates overlapping bars (not that useful, but if you’re doing it then use fill = NA)
 dodge: places bars next to one another (the most useful, in my opinion)
 fill: makes all of the bars the same height (if you don’t care about the yvariable)
Note: geom_jitter()
is a useful position adjustment for scatter plots to solve the problem of overplotting (where you have a lot of overlapping dots that aren’t visible).
Coordinate Function
Most likely, you won’t be using this argument because the default Cartesian coordinate system will satisfy your needs. However, here are some common uses:
coord_flip()
switches the x and y axescoord_fixed()
lets you define the ratio between your x and y axes (default: 1)
Facet Function
Facets are subplots that are useful for visually separating your data by discrete variables. You can create facets in two main ways:
facet_wrap()
splits the plot by a single discrete variablefacet_grid()
splits the plot by a combination of two variables separated by~
Titles, Labels, and Axes
Even though these aspects are not a part of the basic structure of graphs, they are one of the most important. Nobody cares how great your graph looks if they don’t know what it’s meant to show.
The basics are best shown through example:
…and we’re finished! Not too bad, right?
I’ve included some useful references and example code below that illustrates the concepts of this post in practice.
Quick References
A reference for ggplot2 point shapes:
A reference for ggplot2 line types:
Basic Examples in R code
I recommend copying and pasting this code into RStudio for ease of use.

This post is meant for a person who has used ggplot2 in the past and is looking for a brief summary of the basics. The content in this post is based on chapter three of R for Data Science by Hadley Wickham & Garrett Grolemund, which I would highly recommend reading in full if you have never used ggplot2 before. ⤴