Getting Started

Mastery of the mathematics and applications of this intuitive statistical concept will advance your credibility as a decision maker.

Image for post
Image for post
Photo by Ella Olsson from Pexels

Bayes Theorem gives us a way of updating our beliefs in light of new evidence, taking into account the strength of our prior beliefs. Deploying Bayes Theorem, you seek to answer the question: what is the likelihood of my hypothesis in light of new evidence?

In this article, we’ll talk about three ways that Bayes Theorem can improve your practice of Data Science:

By the end, you’ll possess a deep understanding of the foundational concept.

#1 — Updating

Bayes Theorem provides a structure for testing a hypothesis, taking into account the strength of prior assumptions and the new evidence. …


Even in the aftermath of the replication crisis, statistical significance lingers as an important concept for Data Scientists to understand

Image for post
Image for post
Photo by Pixabay from Pexels

There are many types of statistical testsnull hypothesis significance testing predominates.

With this technique, the objective is to test an observation against the null hypothesis. You can think of the null hypothesis as the status quo. It represents the situation where the intervention does not work.

Significance testing rose to preeminence because it is a useful way to draw inference over a subset of data drawn from a larger population. This article will enhance your intuition about this useful data science technique.

Overview

The goal of the researcher conducting the null hypothesis test is to evaluate whether or not…


Drop in for some tips on how this fundamental statistics concept can improve your data science.

Image for post
Image for post
Photo by Cameron Casey from Pexels

The distribution of data refers to the way the data is spread out. In this article, we’ll discuss the essential concepts related to the normal distribution:

Overview

Data distribution is of great importance in statistics because we are pretty much always sampling from a population where the full distribution is unknown. The distribution of our sample may put limitations on the statistical techniques available to us.


The tools you need to succeed with machine learning in the new year.

Image for post
Image for post
Photo by Ian Schneider on Unsplash

Web Resources

🔦 ML Showcase — great for project inspiration, this repository of data science projects from Team Paperspace should certainly get your wheels turning.

🎓 Codecademytime after time, I find myself recommending this powerful learning platform to folks breaking into coding or looking to pick up new languages. The strength of Codecademy lies in its simplicity — these exercises will get your muscle memory trained up so that you’ll be typing code like a master in no time.

🖌 The Data Visualization Catalogue — this site offers archetypal renderings of all the creative chart options available to help you…


The sixth tool is coffee.

Image for post
Image for post
Photo by Chevanon Photography from Pexels

In Stephen Covey’s masterful 7 Habits of Highly Effective People, the seventh habit is “sharpen the saw.” This refers to enhancing our assets to seek continuous improvement in our work. As Abe Lincoln said,

Give me eight hours to chop down a tree, and I will spend the first six sharpening the saw.

Better tools to structure, simplify, and broaden our Data Science work will make us more effective thinkers, decisionmakers, and practitioners.

In this article, we’ll explore how to sharpen our Data Science saws — and also investigate the unanswered question of who is handing out saws to so…


A step-by-step walkthrough for a simple portfolio project using sklearn’s clustering algorithm to create an interactive dashboard for your city.

Through unsupervised learning, a data scientist can explore an unlabeled dataset to produce categories or clusters. You can use this technique to create a neighborhood explorer tool to help residents and visitors develop an understanding of points of interest near where you live.

Image for post
Image for post
via GitHub

You can use a publicly available points of interest dataset like I did, or you could scrape data from Yelp or TripAdvisor.

In this article:

I was inspired to create this project based on my interest in GIS data and my love of Washington, DC, where I went to…


Office Hours

The size of the digital universe increased 3000% in the past decade. Here’s how to manage all your organization’s data.

Image for post
Image for post
Photo by Dino Reichmuth on Unsplash

This article will help you understand the whys and hows of implementing better data management practices at your organization.

In 2020, the size of our digital universe is 40 zettabytes. By comparison, the world had produced just 1.2 zettabytes of data by the year 2010. That represents explosive growth of 3,000% in a single decade. Obviously, this presents both opportunities and challenges.

In The Lean Startup, Eric Ries explores how innovation leaders use data to support experimentation. A strong data infrastructure allows an organization to validate hypotheses about customer values. …


Statistics, SQL, Python, and machine learning are all important capabilities to master in the year ahead.

Image for post
Image for post
Tools for Data Science detective work. Photo by ian dooley on Unsplash

2020 has been a rough year. Amidst the pandemic, economic fallout, quarantine orders, racial reconning, a stressful U.S. election, holidays spent separated from loved ones, important milestones passed without the recognition they deserved, and other deeply tragic circumstances — something troubling happened to me.

I lost my jeans.

After six months of Teams calls wearing athletic shorts under a nice top, it had come time to break out the smart-casual dark wash staple to wear to an outdoor happy hour.

I could not find them anywhere. I looked in each box in storage, then in every closet in the house…


Post-COVID, machine learning is increasingly crucial for business success.

Image for post
Image for post
Photo by cottonbro from Pexels

COVID-19 accelerated the end of 20th-century trends and entrenched the dominance of 21st-century paradigms. For millions around the world, the pandemic response will forever shape the way we work, where we chose to live, and how we engage in commerce.

I’m telling everyone to update their model. You should really only use data since COVID if possible.

Dr. Carl gold, former Wall Street quantitative analyst turned Chief Data Scientist and author of Fighting Churn with Data, urges Data Scientists to rethink their approach to modeling customer behavior due to model drift, which occurs when the training dataset no longer faithfully…


Python is the fastest growing, most-beloved programming language. Get started with these Data Science tips.

Image for post
Image for post
Photo by Shelby Miller on Unsplash

With Python’s straightforward, human-readable syntax, anyone can access impressive capabilities for scientific computing. Python has become the standard language for data science and machine learning, and it was rated in the top three most loved languages in Stack Overflow’s 2020 Developer Survey.

If you’re a newcomer to this much loved programming language, here are ten tips to promote the flourishing of your Python skillset. You can follow along in this Google Colab notebook (plus, a quick video introduction to Google Colab).

#10 — List comprehensions

A simple, single-line syntax for working with lists, a list comprehension allows you to access and perform an action…

Nicole Janeway Bills

Data Scientist at Atlas Research in Washington, DC | Certified Data Management Professional | www.facebook.com/groups/breakingintodatascience/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store