Justin Pho

Data Science

  • Flexibility matters to data scientists; they need to build methods/simulations tailored to a problem. Building your own novel models is important, because you don't wish to make unrealistic assumptions just to use an ill-suited method that exists already. In a high-level programming language like R, python or MATLAB you can create new, customized, reproducible tools for yourself whenever you like.
  • Manipulating sets of numbers will be your stock and trade as a data scientist. The two most important components of the R language are objects (storing data) and functions. Functions will automate tasks and do complicated calculations.
  • function() helps construct custom R functions using a body of code in {} and saving to an R object.
  • The foundation of data science is the ability to store large amounts of data and recall values on demand. However, storing data is not the only logistical task that you will face as a data scientist. You will often want to do tasks with your data that are so complex or repetitive that they are difficult to do without a computer. Some of the things can be done with functions that already exist in R and its packages, but others cannot.
  • You will be the most versatile as a data scientist if you can write your own programs for computers to follow.