Plotting live charts with Yahoo Finance data and ggplot2 in R

Author: Dr Chibisi Chima-Okereke Created: May 17, 2013 12:14:00 GMT Published: May 22, 2013 05:34:00 GMT

The principle aim is simple, you would like to pull some stock price data from Yahoo Finance, live (in this case it is delayed) and plot this in R. Of course you can choose to analyse the data live as well as plotting, but today we will keep it simple and focus on pulling the data from the Yahoo Finance CSV API and plotting in ggplot2.

You could also choose to plot using Google charts, and if you are working with R, there is a great package googleVis by Markus Gesmann. His googleVis blog is located here. Incidentally, he is one of the organizers of the R In Insurance Conference.

We require just two packages here, ggplot2, and gridExtra which as well as providing more functionality to the grid package allows us to arrange ggplot2 charts together.

options(stringsAsFactors = F)
require(ggplot2)
require(gridExtra)

Downloading stock quote data from the Yahoo Finance API

I use the ‘live’ to describe the live data because, it is made clear that the stock data is delayed. We will be concerned with loading data using the CSV api, which really only involves creating the the appropriate URL. A simple tutorial on creating this URL is given here and the property list of the items is given here.

The first thing I do is to create an environment to store the data, this is not essential but often with processes like this it may be safer not to have the source data frame hanging around the global environment so that it is not unexpectedly changed and so we are forced to get() and assign() values to it.

# Creating the environment
assign("envPrevData", new.env(), envir = .GlobalEnv)

We then use the paste() function to generate our URL query and download it using the read.table() function. The time given for the stock quote is to the nearest minute but we can download more frequently than this. I have chosen to use the system time Sys.time() on the machine which is will be the time just after I made the query. In addition, the time I am using is local here, BST but obviously "GOOG" is traded on NASDAQ which has EDT.

Each call to this function will obtain a row of data ready to be appended to the main data frame.

# Function to get the data
GetLiveData <- function(sSymbol = "GOOG")
{
  sAddress <- paste("http://download.finance.yahoo.com/d/quotes.csv?s=", 
      sSymbol, "&f=nsb2b3v0&e=.csv", sep = "")
  cat("Downloading data from ", sAddress, "\n")
  dfYahooData <- read.table(sAddress, sep = ",", header = F)
  # I chose to create my own time 
  Time <- Sys.time()
  # Appending the time to the data frame
  dfYahooData <- cbind(dfYahooData, "Time" = Time)
  # Adding the column names
  names(dfYahooData) <- c("Name", "Symbol", "Ask", "Bid", "Volume", "Time")
  dfYahooData
}

The stock update function

Once we have downloaded the data we need a function to ensure that it is appended to the main data frame. The other nice thing about working with environments is that any name you want is already a NULL and you can append to it. For instance, if dfStockData does not exist in the global environment, you cannot append data to it for example, this

# Here dfCurr exists but dfStockData does not
rbind.data.frame(dfStockData, dfCurr)

Would give an error since dfStockData does not exist but if we create an environment envPrevData, we can just do this

# Here dfCurr exists but dfStockData does not
rbind.data.frame(envPrevData$dfStockData, dfCurr)

and carry on spontaneously appending or assigning things as we like. I take advantage of this to append the data to a data frame the environment without worrying about initializing it first

# Function to update the current data
UpdateStockData <- function(sSymbol = "GOOG")
{
  envPrevData <- get("envPrevData", envir = .GlobalEnv, mode = "environment")
  dfCurr <- GetLiveData(sSymbol = sSymbol)
  print(dfCurr)
  try(envPrevData$dfStockData <- rbind.data.frame(envPrevData$dfStockData, dfCurr))
  assign("envPrevData", envPrevData, envir = .GlobalEnv)
  invisible()
}

The plotting function

The plotting function is a bit of a beast, it has to be admitted. Just consider that there are three parts.

  1. Some small data preparation to get the bid and ask movements. The midpoint is also calculated though it and the bid movements are not used.
  2. The first plot creates the bid-ask line range chart on the top.
  3. The second plot creates the volume bar chart below bid ask chart.
# The plot function
plotChart <- function(){
  # Here we get the data
  envPrevData <- get("envPrevData", envir = .GlobalEnv, mode = "environment")
  dfStockData <- envPrevData$dfStockData
  # We only want to plot if we have five or more points
  if(nrow(dfStockData) > 4){
    # 1. We prepare the data
    AskMovement <- factor(sign(c(0, diff(dfStockData$Ask))), levels = c(-1, 0, 1), 
        labels = c("Down", "No Change", "Up"))
    BidMovement <- factor(sign(c(0, diff(dfStockData$Bid))), levels = c(-1, 0, 1), 
        labels = c("Down", "No Change", "Up"))
    VolumeChange <- c(0, diff(dfStockData$Volume))
    dfStockData <- data.frame(dfStockData, AskMovement, BidMovement, VolumeChange)
    dfStockData$Mid <- with(dfStockData, .5*(Bid + Ask))
    # 2. This is the first plot for the Bid-Ask
    bAPlot <- ggplot(dfStockData, aes(Time, Mid,
      ymin = Bid, ymax= Ask, colour = AskMovement))
    bAPlot <- bAPlot + geom_linerange(lwd = 1.5) + xlab("") + ylab("Price\n")
    bAPlot <- bAPlot +  theme(legend.position = "top", 
        plot.margin = unit(c(0, .5, -1.5, 0), "lines"), 
        axis.text.y = element_text(angle = 90), axis.text.x = element_blank()) + 
        labs(colour = "Ask Movement") + 
        xlim(range(dfStockData$Time)) + 
        scale_colour_manual(values=c("red", "blue", "green"))
    # 3. This is the Volume change plot
    VolPlot <- qplot(y = VolumeChange, x = Time, data=dfStockData, 
        geom="bar", stat = "identity", fill = AskMovement)
    VolPlot <- VolPlot + theme(legend.position = "none", 
        plot.margin = unit(c(0, .5, 0, 0), "lines"), 
        axis.text.y = element_text(angle = 90)) + 
        xlab("\nTime") + ylab("Volume Change\n") +
        xlim(range(dfStockData$Time)) + 
        scale_colour_manual(values=c("red", "blue", "green"))
    # We use grid arrange to arrange the plots
    grid.arrange(bAPlot, VolPlot, nrow = 2, heights = c(1.5, 1))
  }
}

Executing the process

We Finally bring everything together and run the code

# Running the process
CurrTime <- Sys.time()
while(Sys.time() < CurrTime + 60*60){
  UpdateStockData("GOOG")
  plotChart()
  Sys.sleep(30)
}

Conclusion

This is clearly a quick and dirty example, but shows how you can start constructing your own analysis tool based on ‘live’ quotes from Yahoo. The Yahoo Finance CSV api is very easy to use but there are some unfortunate problems. The bid and ask volumes can come down with commas for separating thousands zeros which is a great shame for a CSV formatted object. I have no idea why commas should be in numbers in the first place - unless you take them as decimal points in which case a very different separator should be used.

To get interactive graphics, the Google charts api is good and the googleVis package is convenient for R programmers.

The complete code for this blog is located here.