I explored different ways to fetch historic stock data using SAS Proc HTTP method and most suitable way was to use Yahoo Finance API which offer free download. I used Yahoo chart API along with Proc HTTP procedure of SAS to extract data in form of JSON file and convert it to SAS Tables for further analysis. Data is not as clean as I would like and require transformation before we can use it.
Whole process is based on three parts
- Get Cookies for crawling yahoo API for data download
- Get Stock historical data in form of JSON file using PROC HTTP
- Transform data to easily readable table form.
/* use WORK location to store our temp files */ filename out "%sysfunc(getoption(WORK))\output.txt"; filename hdrout "%sysfunc(getoption(WORK))\response1.txt"; /* This PROC step caches the cookie for the website finance.yahoo.com */ /* and captures the web page for parsing later */ proc http out=out headerout=hdrout url="https://finance.yahoo.com/quote/AAPL/history?p=AAPL" method="get"; *debug level=3; quit; /* Read the response and capture the cookie value from */ /* the CrumbStore field. */ /* The file has very long lines, longer than SAS can */ /* store in a single variable. So we read in <32k chunks. */ data crumb (keep=crumb); infile out recfm=n lrecl=32767; /* the @@ directive says DON'T advance pointer to next line */ input txt: $32767. @@; pos = find(txt,"CrumbStore"); if (pos>0) then do; crumb = dequote(scan(substr(txt,pos),3,':{}')); /* cookie value can have unicode characters, so must URLENCODE */ call symputx('getCrumb',urlencode(trim(crumb))); output; end; run;
In code above, you can see that we are using https://finance.yahoo.com/quote/ for getting a cookie which we will use in later step. I prefer to store all JSON file in work library location, so that all files deleted at end of SAS Session. At end, I used data step to get cookie in macro variable
Next is to download historical data using Yahoo finance chart API. I used URL https://query1.finance.yahoo.com/v7/finance/chart/&stock?
There are different parameter you can use to download relevant data.
/*Lets fetch Historic data from Microsoft*/ %let stock=MSFT; filename test "%sysfunc(getoption(WORK))\&stock..json"; proc http url="https://query1.finance.yahoo.com/v7/finance/chart/&stock?range=1mo%str(&)interval=1d%str(&)indicators=quote%str(&)includeTimestamps=true%str(&)includeTimestamps=true%str(&)crumb=&getCrumb." out=test; *debug level=3; quit; LIBNAME stock JSON "%sysfunc(getoption(WORK))\&stock..json";
While querying Yahoo finance chart API, you can use filter as per below Parameters:
- Valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
- Either Use period parameter or use start and end
- Valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
- Intraday data cannot extend last 60 days
- Download start date string (YYYY-MM-DD) or _datetime.
- Default is 1900-01-01
- Download end date string (YYYY-MM-DD) or _datetime.
Below is a sample downloaded data that needs data manipulation
Next step is to use data step to make this table in analysis ready form.
LIBNAME stock JSON "%sysfunc(getoption(WORK))\&stock..json"; DATA &stock._1; length p6 $15.; SET stock.alldata; where P4 ="symbol" or p4 =: "timestamp" or p6 =: "volume" or p6 =: "open" or p6 =: "close" or p6 =: "high" or p6 =: "low" ; if p6 ="" then p6 =p4; keep p6 value; run; data symbol (keep=Value rename=(value=symbol)) timestamp (keep=Value rename=(value=timestamp)) volume(keep=Value rename=(value=volume)) open (keep=Value rename=(value=open)) close (keep=Value rename=(value=close)) high (keep=Value rename=(value=high)) low (keep=Value rename=(value=low)); set &stock._1; if p6 ="symbol" then output symbol; else if p6 =: "volume" then output volume; else if p6 =: "open" then output open ; else if p6 =: "close" then output close; else if p6 =: "high" then output high; else if p6 =: "low" then output low; else if p6 =: "timestamp" then output timestamp; run; data &stock._2; if (eof1 and eof2 and eof3 and eof4 and eof5 and eof6 and eof7) then stop; if not eof1 then set symbol end=eof1 ; if not eof2 then set timestamp end=eof2 ; if not eof3 then set volume end=eof3 ; if not eof4 then set open end=eof4 ; if not eof5 then set close end=eof5 ; if not eof6 then set high end=eof6 ; if not eof7 then set low end=eof7 ; run; data &stock._3(rename=(timestamp_n=timestamp volume_n =volume open_n = open close_n = close high_n = high low_n = low)); set &stock._2; timestamp_n = input(timestamp , 20.); /*timestamp_n = timestamp_n + 315619200;*/ timestamp_n = dhms('01jan1970'd,0,0, timestamp_n + gmtoff()); volume_n = input(volume , 20.); open_n = input(open , 8.); close_n = input(close , 8.); high_n = input(high , 8.); low_n = input(low , 8.); drop volume close open high low timestamp; format timestamp_n datetime20.; run;
Thats it, SAS Table containing historic stock data is available for statistical analysis.
We can also get same data using python yfinance package and get data in form of dataframe but for me I like SAS :).
– Pallav