Brandon Walker

Data Scientist

Tutorial: How should you build a data science portfolio?

5 minutes
July 3, 2018

I’m going to go over two ways to quickly get products on to your résumé. Both of them require you to have the RStudio IDE installed, and that you know a little R. I’ll walk through both in more in-depth articles later, but you may be able to get started with just this.

Rmarkdown and Rpubs

This first project is really a recommended project if you don’t have anything on your resume yet. I think you should pick a data set that is not common (don’t do something that has been done before, that won’t set you apart). I’d recommend making an account and searching Kaggle Datasets. What’s even be better is making your own dataset via an API, but that’s not necessary for now.

  1. In the R-Studio IDE go to File > New File > R Markdown

  2. Select Document then HTML Output. Select HTML Output

  3. Learn a bit of Rmarkdown, it is pretty easy to learn. You can learn the basics of markdown very quickly (seriously, in about 5 minutes) at this link. Side Note: If you’re a statistics major learning Rmarkdown is extremely useful, if you have to turn in R code and output as an assignment doing it in Rmarkdown makes things much easier and more professional looking.

  4. Try learning a new machine learning technique that is interesting to you, but I’d recommend beginning with linear regression, logisitc regression, or random forest. Then apply this to your data set.

  5. Once you have your report the way you want then click the knit button at the top of your script. Click Knit

  6. Then click publish in the top right of your new document. Click Publish

  7. Then select RPubs. You’ll have to go make an account and then you can publish your work online. RPubs

In the end you will be provided with a url that you can put on your resume as a link to your work. Check out Rpubs to see other people’s work.

Shiny and shinyapps.io

With Shiny you can make an interactive web app. Interactivity is quite frankly more impressive to recruiters, so I would make having one of these on your résumé a priority. like the project above, I’d pick a data set that is interesting to you and has some variable that you can manipulate to update some charts on your page.

  1. In you R console run the command install.package(“Shiny”)
  2. Go to File > New File > Shiny Web App
  3. Select Single File and name your application name your shiny web app
  4. Some default code will show up. Go ahead and run it to see what it does by clicking Run App in the top right of the script. This is just so you know what you may be building.
  5. There are 4 neccesary lines of code here.

    • library(shiny) # loads Shiny
    • ui <- fluidPage( ) # This is where you create the ui (user interface) or what you see
    • server <- function(input, output){ } # This takes the inputs and does the actual work
    • shinyApp(ui = ui, server = server) # Runs the application
  6. To get more information on shiny layouts go to Help > Cheatsheets > Web Applications with shiny. To get a example of all the things your ui can do check out the shiny gallery. Select the sort of layout you’d like from the cheatsheet. Each of the objects in the layouts will be a place for you to put in inputs or some sort of chart or plot. layouts
  7. Inside the layout decide what input you want. If you wanted a splitLayout and wanted a numeric input you would put the code “numericInput(inputId, label, value, min, max, step)” where it says # object 1. What goes in each of these arguments is very important.

    • In the “inputId” argument you will be putting a string that becomes a variable that references the numeric input the user gave. You will need this later!
    • The “label” argument will be the text the user sees
    • The “value” argument gives the default value
    • The “max” and “min” arguments are the min and max you allow that number to take
    • The “step” argument tells you how much the arrows at the side will affect the number by numeric input
  8. Now on the server side we can use this number in a function by using input$“inputId”. In the example code that was created when you made a new shiny app, you’ll see that a numericInput could have been used, if we gave the labelId the value “bins” example code
  9. To create a plot, or some other output you will have to save it into a variable output$“outputId” with the function renderPlot() (or some other function that likely has the word render in it. The “outputId” will then be used on the ui side to finally make the plot.
  10. In the ui, replace # object 2 from step 7 with “plotOutput(”outputId“)”
  11. Finally we run the app then press the publish button. Instead of sending it to rpubs (which only handles static content) you will send it to shinyapps.io. You will need to make an account on the site shinyapps.io, where you can only host about 10 apps for free a month. In the end you will have a url to your working shiny app that you can put on your résumé or LinkedIn bio or anywhere else!

Thanks for reading all of this! Let me know if there is anything you think I missed, didn’t explain clearly, or would like to know more about. I’ll make a more indepth coverage of both Rmarkdown and Shiny in the future as they are both great skills to have and there is a lot to them.