Introduction
I am glad that you decided to take this course! I believe the tools that we will introduce will prove invaluable as you develop your own skills as a researcher and data scientist. Many of these tools have been around since the advent of modern computing. They are tried and tested, and they are continue to improve as time goes on. While they can sometimes feel foreign and their interfaces may feel ancient, trust me when I say that there is good reason that they look and feel the way that they do. Don’t be tempted to think that the flashiest tool is the best tool.
This course is actively developed on GitHub using the tools that we will discuss. Don’t worry if some things feel opaque at first. It takes time to get a sense of these tools and ideas. As we go throught the course, things that at first seemed opaque will become clear. Sometimes we have to experience things without fully understanding them, gradually building context and experience as we go. And there will always be more to learn.
Context
I believe it is important to consider this course and data science in context. Data science is a broad term, and many people have many takes on what it means. For some, it is about data preprocessing, visualization, and summarization. For others, it is largely about machine learning models. Some people consider themselves more akin to statisticians or mathematicians, others more closely align with computer scientists.
My opinion, and this is biased for sure, is that data science most closely associated with computer science. Sure, many of the models that we consider are statistical in nature, but the more often than not, the tools data scientists use are squarely in the wheelhouse of computing. If much of the tools that we are using relate to computing, then we ought to pay attention to what computer science gives us. Computer science has long been concerned with managing data and using it effectively. Data science may be a fairly recent term, but many of the ideas in this field have been considered by computer scientists.
One of the great things about PhD work is that you get to plot your own course with your studies. This is especially true with the Bredesen Center as it is intended to be interdisciplinary. So let me encourage you to seek out diverse perspectives across the University of Tennessee and Oak Ridge National Lab. They are both great places to study and do research.
When I studied at UT, I stradled the Math, Statistics, and Computer Science departments. If it is not already clear from above, I hold most strongly to the Computer Science department, but both the Math and Statistics Departments have their strengths. Feel free to ask me questions about my experience at UT. Though I will be biased, I may be able to help orient you as you seek out courses and research during your time at UT. Fair warning, I will likely encourage you to take plenty of computer science courses.