Plotting live charts with Yahoo Finance data and ggplot2 in R

Author: Dr Chibisi Chima-Okereke Created: May 17, 2013 12:14:00 GMT Published: May 22, 2013 05:34:00 GMT

The principle aim is simple, you would like to pull some stock price data from Yahoo Finance, live (in this case it is delayed) and plot this in R. Of course you can choose to analyse the data live as well as plotting, but today we will keep it simple and focus on pulling the data from the Yahoo Finance CSV API and plotting in ggplot2.

You could also choose to plot using Google charts, and if you are working with R, there is a great package googleVis by Markus Gesmann. His googleVis blog is located here. Incidentally, he is one of the organizers of the R In Insurance Conference.

We require just two packages here, ggplot2, and gridExtra which as well as providing more functionality to the grid package allows us to arrange ggplot2 charts together.

options(stringsAsFactors = F)
require(ggplot2)
require(gridExtra)


I use the ‘live’ to describe the live data because, it is made clear that the stock data is delayed. We will be concerned with loading data using the CSV api, which really only involves creating the the appropriate URL. A simple tutorial on creating this URL is given here and the property list of the items is given here.

The first thing I do is to create an environment to store the data, this is not essential but often with processes like this it may be safer not to have the source data frame hanging around the global environment so that it is not unexpectedly changed and so we are forced to get() and assign() values to it.

# Creating the environment
assign("envPrevData", new.env(), envir = .GlobalEnv)


We then use the paste() function to generate our URL query and download it using the read.table() function. The time given for the stock quote is to the nearest minute but we can download more frequently than this. I have chosen to use the system time Sys.time() on the machine which is will be the time just after I made the query. In addition, the time I am using is local here, BST but obviously "GOOG" is traded on NASDAQ which has EDT.

Each call to this function will obtain a row of data ready to be appended to the main data frame.

# Function to get the data
GetLiveData <- function(sSymbol = "GOOG")
{
sSymbol, "&f=nsb2b3v0&e=.csv", sep = "")
# I chose to create my own time
Time <- Sys.time()
# Appending the time to the data frame
dfYahooData <- cbind(dfYahooData, "Time" = Time)
names(dfYahooData) <- c("Name", "Symbol", "Ask", "Bid", "Volume", "Time")
dfYahooData
}


The stock update function

Once we have downloaded the data we need a function to ensure that it is appended to the main data frame. The other nice thing about working with environments is that any name you want is already a NULL and you can append to it. For instance, if dfStockData does not exist in the global environment, you cannot append data to it for example, this

# Here dfCurr exists but dfStockData does not
rbind.data.frame(dfStockData, dfCurr)


Would give an error since dfStockData does not exist but if we create an environment envPrevData, we can just do this

# Here dfCurr exists but dfStockData does not
rbind.data.frame(envPrevData$dfStockData, dfCurr)  and carry on spontaneously appending or assigning things as we like. I take advantage of this to append the data to a data frame the environment without worrying about initializing it first # Function to update the current data UpdateStockData <- function(sSymbol = "GOOG") { envPrevData <- get("envPrevData", envir = .GlobalEnv, mode = "environment") dfCurr <- GetLiveData(sSymbol = sSymbol) print(dfCurr) try(envPrevData$dfStockData <- rbind.data.frame(envPrevData$dfStockData, dfCurr)) assign("envPrevData", envPrevData, envir = .GlobalEnv) invisible() }  The plotting function The plotting function is a bit of a beast, it has to be admitted. Just consider that there are three parts. 1. Some small data preparation to get the bid and ask movements. The midpoint is also calculated though it and the bid movements are not used. 2. The first plot creates the bid-ask line range chart on the top. 3. The second plot creates the volume bar chart below bid ask chart. # The plot function plotChart <- function(){ # Here we get the data envPrevData <- get("envPrevData", envir = .GlobalEnv, mode = "environment") dfStockData <- envPrevData$dfStockData
# We only want to plot if we have five or more points
if(nrow(dfStockData) > 4){
# 1. We prepare the data
AskMovement <- factor(sign(c(0, diff(dfStockData$Ask))), levels = c(-1, 0, 1), labels = c("Down", "No Change", "Up")) BidMovement <- factor(sign(c(0, diff(dfStockData$Bid))), levels = c(-1, 0, 1),
labels = c("Down", "No Change", "Up"))
VolumeChange <- c(0, diff(dfStockData$Volume)) dfStockData <- data.frame(dfStockData, AskMovement, BidMovement, VolumeChange) dfStockData$Mid <- with(dfStockData, .5*(Bid + Ask))
# 2. This is the first plot for the Bid-Ask
bAPlot <- ggplot(dfStockData, aes(Time, Mid,
bAPlot <- bAPlot + geom_linerange(lwd = 1.5) + xlab("") + ylab("Price\n")
bAPlot <- bAPlot +  theme(legend.position = "top",
plot.margin = unit(c(0, .5, -1.5, 0), "lines"),
axis.text.y = element_text(angle = 90), axis.text.x = element_blank()) +
xlim(range(dfStockData$Time)) + scale_colour_manual(values=c("red", "blue", "green")) # 3. This is the Volume change plot VolPlot <- qplot(y = VolumeChange, x = Time, data=dfStockData, geom="bar", stat = "identity", fill = AskMovement) VolPlot <- VolPlot + theme(legend.position = "none", plot.margin = unit(c(0, .5, 0, 0), "lines"), axis.text.y = element_text(angle = 90)) + xlab("\nTime") + ylab("Volume Change\n") + xlim(range(dfStockData$Time)) +
scale_colour_manual(values=c("red", "blue", "green"))
# We use grid arrange to arrange the plots
grid.arrange(bAPlot, VolPlot, nrow = 2, heights = c(1.5, 1))
}
}


Executing the process

We Finally bring everything together and run the code

# Running the process
CurrTime <- Sys.time()
while(Sys.time() < CurrTime + 60*60){