This site was created by me, Ben Birnbaum, as a side project. My day job is Senior Director of Software Engineering at Flatiron Health, where I lead the Machine Learning Team. You can learn more about me (and see some other side projects) at my homepage.
First, a disclosure: I’m not an expert, and there are definitely better ways to compute these probabilities. But in the interest of transparency, I wanted to share the details of my methodology, approximate as it is.
If you have any suggestions for how to improve this, please send me an email.
My primary data source was the Global Health Observatory data repository, maintained by the WHO. For each country, I used the “nMx - age-specific death rate between ages x and x+n” data to compute the probability someone of a given age will die in that year. These death rates are bucketed by age group, and I assumed that the death rate was the same for each year in the age group.
The one exception I made to this assumption is the 85+ bucket, since the probability of death goes up rapidly each year for people in their eighties and nineties. I couldn’t find a data source that had fine-grained death rates for octo- and nona-genarians for every country, so instead I used life tables for the U.S. from the CDC. I assumed that these probabilities applied to everyone between the ages of 85 and 100, regardless of which country they were in. This is a big assumption, but I didn’t have a good alternative, and it doesn’t affect any probabilities for estimates before age 85 (which is probably what most people care about).
The CDC data only goes up to age 100, and I couldn't find much data for estimating the probability of death each year for people older than 100. I found a study claiming that for people who are 105 or older, the probability of death in a given year remains constant at 50%. So for ages 100 to 104, I set the probability of death in a given year as the linear interpolation between the probability of death at 99 (0.32) and the probability of death at 105 (0.5). For ages 105 and older, I set the yearly probability of death to be 0.5.
Once I compiled the data as described above, I had a probability that someone would die in a given year, for each possible country, sex, and age. This is what the website uses to compute probabilities interactively.
Each time the user specifies a year of birth, sex, and country, the website runs 40,000 simulations of that person's life, where each time the person gets one year older, they die with the probability given by the table. A cumulative distribution function is estimated from these simulations, and the first 1000 results are shown in the visualization.