Kaggle panel recap: my data science journey

Introduction

This past March I had the distinct pleasure of participating in a panel about making the career transition to data science as part of Kaggle’s CareerCon 2018. As a result of this experience, I’ve gotten enough emails asking for more information about my data science journey that it warrants a blog post, per David Robinson’s advice:


How did Kaggle know about your story?

Short answer: Twitter.

Longer answer: Rachael is a Kaggle employee, and we have some overlap in our Twitter communities. She had seen some of my tweets about data science and thought that I might be a good fit for the upcoming panel discussion.

I don’t always tweet about career transitions in data science, but I do keep my Twitter bio section pretty focused on what I do and how I got there:

What motivated you to develop your career in data science? Was it driven by your personal interest or the demand of the job (such as a particular problem), or both?

Short answer: sheer dumb luck.

Long answer: I “found” data science when I was at an incredibly low point in my life.

It was 2013, and I had just impulsively moved cross-country (from NYC to Seattle) with a boyfriend I started having second thoughts about the moment we left Brooklyn. By the time we got to Seattle the relationship was in a state of utter disrepair, but my lack of a job and the legally binding lease we had both signed meant we were living together for at least a year.

Shortly after finding a teaching job in Seattle I was laid off from said teaching job which, for someone who sees teaching as their calling in life, was utterly devastating.

My life was quickly spiraling out of control, and so I coped by playing astronomical amounts of World of Warcraft. This led to starting a guild, building a community, and developing an online network of friends that helped ease the anxiety of living in a new city. Eventually I developed enough confidence to explore non-teaching career options, and “programming” seemed interesting.

I enrolled in my local community college’s “Intro to Computer Programming 101” course and struggled with the content so much that I stopped doing the assignments and ultimately failed the course. I re-enrolled with a different professor the following semester, hit the same roadblocks, and this time managed to drop out before I failed out.

Having lost all confidence in my ability to do anything besides play video games, I created a Twitter account in the hopes of finding a job opportunity that was related to gaming in some fashion. After a couple of days I noticed a company had a posting for volunteer Community Managers, so I applied and was onboarded to write Hearthstone articles.

Part of my responsibilities as a Community Manager involved participating in weekly phone calls with staff, and somehow my experience working with data during my time in graduate school came up during one of these calls. Shortly after that I was asked about my interest in doing “data work” for the company, and since playing WoW wasn’t paying the bills, I jumped at the chance to interview. During the interview process I started Googling for data jobs, came across the term “data science,” and figured I knew enough to be proficient at the basics, and could teach myself the rest in order to be a competitive candidate.

I completed the interview process - which consisted of a lot of Googling and trying to do work with massive data sets in Excel - and when they offered me the job, I told them I wanted to be brought on as a data scientist. They agreed with the title, and here we are.

Data science is such a broad subject. How did you decide which areas to specialize in?

I don’t recall ever making a conscientious decision to specialize in a given field, but rather relied on what I knew and what I was passionate about to guide where I was practicing data science. While I started out in gaming, I eventually transitioned to non-profit work first with the Girl Scouts and later with Teaching Trust.

What were the most difficult challenges you faced when building a data science career?

This would be the logical place to say imposter syndrome, but it would be disingenuous. I don’t suffer from imposter syndrome, and while I can empathize with those who grapple with it, it’s not something that I’ve personally experienced.

I spend a lot of time reflecting on my knowledge base, and am acutely aware of what I know and what I don’t know, and I have always been upfront and honest about my skills and abilities in the hiring process. Fun fact: I was once offered a job doing work in Python, even after telling the interviewer that I didn’t know Python but instead worked in R. (They said that as long as I learned Python they had no concerns, although ultimately I turned the job down.)

From a personal perspective I struggle with the overwhelming growth of content and technologies in data science. It’s easy to look at everything you still have to learn and feel defeated before you even get started. I’ve found it helpful to mitigate this by doing two things:

  1. Reminding myself that I will never, ever know everything there is to know
  2. Aligning my learning path with my profession, so that I can utilize both professional and personal time to deepen my skill set and understanding

From a professional perspective I still struggle with contracts, non-disclosure agreements, and intellectual property. I got burned pretty badly by an NDA I signed early on in my career, and it’s made me very wary of any organization that asks me to sign NDAs or non-competes.

Which learning programs/tools did you use? Any you could recommend?

Honestly I relied heavily on prior statistics coursework and Google for the first few years I was learning data science. Since I was largely learning on the job, I felt like I didn’t have time to invest in an online course, because it felt like it would take too long to get to a point in the course where I was learning information relevant to the work I was doing.

In the past year and a half I’ve started to invest time into formalizing my data science education. I struggle with video-based courses and definitely have a preference for reading books in order to learn. The texts that have been indispensible to me have been:

The online resources I recommend are:

From the above it’s relatively easy to surmise that I work largely in R. I also benefit greatly from having a solid background in both math and science, and so have always felt comfortable transferring those concepts to data science.

With all of that said, I’ve begun to learn Python for data science as well, and am excited about the following resources:

While these are all great resources, they won’t get you very far if you aren’t applying what you’ve learned on a regular basis. Create a GitHub account and start building things - they don’t have to be grandiose or even useful beyond your own personal learning, but creating them will help you develop your skills and build out a portfolio of work.

Related