Thanks for coming to my little corner of the inter webs. I try to keep this webpage up to date and mildly interesting, but no promises.
Getting started with the Charlottesville Open Data Portal’s API is super simple in R
, thanks to the awesome library(geojsonio)
. There is brief markdown in the ‘Tech Docs’ section that shows how to download two different types of datasets from the Portal. In only a couple of lines of code you can be done downloading and happily mapping your data visualizations.
At yesterday’s Data Bootcamp by Smart Cville I shared my CrimeCast Shiny app for mapping the city’s top 16 most frequent crime types. I used the ODP’s police reports to ggmap::geocode()
the block address, to show the spatial distrution with a slide input for showing only certain time of year. There is a big spike in towing around the football stadium in the fall, maybe I was wrong about the Wahoo students, maybe it’s the Wahoo fans who are causing the fall ruckus in Charlottesville.
I just posted a first draft of my towing time-series analysis using the city’s brand new Open Data Portal to forecast towing events. I also take a quick look at the size, scope and quality of the currently available Crime Data. I’m planning to follow this post up by expanding into map based viz and developing a time of day forecasts to help people decide when and where is a good time to risk an quick illegal park for convenience.
Earlier this year I built a Shiny app to visualize historical Gallup polling data for US Presidents. In light of the recent events in my hometown of Charlottesville and the rest of the country, I decided to go back and rework the code to make my visualization better.
I originally wanted this app to be solid visual evidence for just how historically abnormal the current public opinion of the Executive branch really is. My first build used loess regression to fit a smoothed trend line for the “average” Republican and Democratic President in an effort to show how today compared to a “typical” president.
In the new build I decided to show all of the poll numbers directly which shows just how good or bad each president fared over his term(s). And I added an option to project just how bad it will get for the 🍊, only time will tell. But one thing is clear no other president to date has started as low as the 🍊 has, not even Ford after Nixon resigned amid Watergate. And that could be a good sign for Mike Pence. :)
HTTPS has become the standard for web traffic for many solid security reasons. Unfortunately right now GitHub Pages does not support using a custom domain AND HTTPS (the standard domain, username.github.io
is automatically served via HTTPS). There is way forward via CloudFlare, an internet service company that is on a mission to make web traffic safer and more efficient.
There are a couple of tutorials out there for this, but I used the write up by CloudFlare’s own Juande Ali and highly recommend it. I have step by step picture tutorial for doing this with NameCheap, a domain service company, over in the new Tech Docs section.
I finally finished the John’s Hopkins Data Science Specialization on Coursera! I think this calls for a glass of good bourbon :)
The 10 courses in the specialization took me the better part of a year to work through, but I feel like it was great exposure to various tools in R’s ecosystem. If you are thinking about trying to get started with R
I highly recommend it. For the capstone project, you are tasked with building a Shiny App that performs text prediction, you can find a link to mine on my Shiny page.
I started leading an introduction to R
via R for Data Science, written by the tidy lords Garrett Grolemund and Hadley Wickam. This book is a fantastic and free reference, and I am posting the exercise answers here as we go. Hopefully you will find the comments useful, as some of these exercises can get tricky quickly. If you are interested in following, joining or repeating the group, let me know.
I built a scraper for ASAPSports that collects and cleans interview transcripts from a media press event. I then used several different text analysis packages library(sentimentr)
, library(readability)
and library(tidytext)
to quantify readability and sentiment of the athletes’ responses. Ultimately, I discovered that hockey players are slightly nicer than other professional athletes, maybe it’s a Canadian thing eh?
In the spirit of regularly adding updates, I decided to mark down a quick “data break” session I attended today at the UVA R UseRs group Meetup. We were trying to do a quick (~1 hour) exploratory analysis of previously unknown data and make some investigative visualizations. I made ggmaps showing the popularity of the National Parks across the USA, and it turns out the Applaichain Mountains and greater San Francisco are pretty cool places to get outside.
Jeez, its been almost two months since I updated this thing with anything new. So to fix that, I just uploaded my milestone report from the Coursera Data Science Capstone, over in ‘Data Docs’, or just clink on this link.
I will try to enforce a minimum of one update per month going forward, because the world is so big and there are so many cool things in it :)
I’m giving on lightening talk about library(magrittr)
and library(purrr)
for the UVA R Users Meetup on March 29, 2017. There is an Rmd version of my code demo in the Data Docs menu.