Exploring the AustralianPoliticans R Package

Australia federated in 1901.1 Rohan Alexander is unusually interested in the history of Australian politicians, and he decided to convert some of his knowledge into an R package, the appropriately named, AustralianPoliticians. In brief, the package has datasets that contain information on every person who has ever sat in the House of Representatives (MPs) or the Senate since 1901. This post is a shameless plug for that package,2 and shows you how to read in and play around with the data.

Install the package and load in the data

First, let’s load in some packages we need and install the AustralianPoliticians package. It’s not on CRAN so you’ll need to install it from GitHub.

library(tidyverse)
library(lubridate)
devtools::install_github("RohanAlexander/AustralianPoliticians")

The AustralianPoliticians package has a series of datasets built into it. Let’s read in the main dataset all and the MP and Senate datasets:

all <- AustralianPoliticians::all %>% as_tibble()
by_division_mps <- AustralianPoliticians::by_division_mps %>% as_tibble()
by_state_senators <- AustralianPoliticians::by_state_senators %>% as_tibble()

The README on GitHub has good explanations of what each dataset contains. Briefly, the all dataset contains one row for each politician, and has information on their name, gender, date of birth, date of death, Wikipedia page etc. The by_division_mps and by_state_senators datasets have info on which electoral divisions / states each politician held. Note, these can change over time, so there can be more than one row/observation per politician. There’s dates the positions were held, the reason why the position ended (defeated, resigned, died etc), and other interesting info. The tables are easily joined the the all dataset based on the uniqueID column. There are other datasets available based on party and whether or not the person was a Prime Minister.

Deaths of Australian politicians

Because I’m a demographer, and a fun sort of person, I wanted to look at the mortality of politicians. The following bit of code calculates the age of death for all those who have died, as well as the year and age they were first elected:

deaths <- all %>% 
  rowwise() %>% 
  # some people only have a birth year available, let's arbitrarily say they were born in the middle of the year
  mutate(birth_final = as_date(ifelse(is.na(birthDate),
                                      ymd(paste(birthYear, 06, 30, sep="-")), 
                                      birthDate))) %>% 
  select(uniqueID, displayName, deathDate, birth_final) %>%  
  # calculate age at death
  mutate(age_at_death = interval(birth_final, deathDate)/years(1)) %>% 
  # filter(!is.na(age_at_death)) %>% 
  # join on MP and senate info
  left_join(by_state_senators) %>% 
  left_join(by_division_mps) %>% 
  group_by(uniqueID) %>% 
  # just keep the initial election
  filter(row_number()==1) %>% 
  mutate(year_first_active = ifelse(is.na(senatorsFrom), year(mpsFrom), year(senatorsFrom)), 
         age_active = ifelse(is.na(senatorsFrom), 
                            interval(birth_final, mpsFrom)/years(1), 
                            interval(birth_final, senatorsFrom)/years(1)),
         birth_year = year(birth_final)) %>% 
  ungroup()
deaths
## # A tibble: 1,776 x 22
##    uniqueID displayName deathDate  birth_final age_at_death senatorsState
##    <chr>    <chr>       <date>     <date>             <dbl> <chr>        
##  1 Abbott1… Abbott, Ri… 1940-02-28 1859-06-30          80.7 VIC          
##  2 Abbott1… Abbott, Pe… 1940-09-09 1869-05-14          71.3 NSW          
##  3 Abbott1… Abbott, Mac 1960-12-30 1877-07-03          83.5 NSW          
##  4 Abbott1… Abbott, Au… 1975-04-30 1886-01-04          89.3 <NA>         
##  5 Abbott1… Abbott, Jo… 1965-05-07 1891-10-18          73.6 <NA>         
##  6 Abbott1… Abbott, To… NA         1957-11-04          NA   <NA>         
##  7 Abel1939 Abel, John  NA         1939-06-25          NA   <NA>         
##  8 Abetz19… Abetz, Eric NA         1958-01-25          NA   TAS          
##  9 Adams19… Adams, Jud… 2012-03-31 1943-04-11          69.0 WA           
## 10 Adams19… Adams, Dick NA         1951-04-29          NA   <NA>         
## # … with 1,766 more rows, and 16 more variables: senatorsFrom <date>,
## #   senatorsTo <date>, senatorsEndReason <chr>, senatorsSec15Sel <int>,
## #   senatorsComments <chr>, mpsDivision <chr>, mpsState <chr>,
## #   mpsEnteredAtByElection <chr>, mpsFrom <date>, mpsTo <date>,
## #   mpsEndReason <chr>, mpsChangedSeat <int>, mpsComments <chr>,
## #   year_first_active <dbl>, age_active <dbl>, birth_year <dbl>

So what proportion of all politicians have died? Almost 56%:

sum(!is.na(deaths$age_at_death))/nrow(deaths)
## [1] 0.5579955

Let’s look at the proportion of politicians who have died by birth year:

deaths %>% 
  group_by(birth_year) %>% 
  summarise(proportion = sum(!is.na(age_at_death))/n()) %>% 
  ggplot(aes(birth_year, proportion)) + 
  geom_point() + 
  theme_bw(base_size = 12) + 
  ggtitle("Proportion of politicians who are dead by birth year")

So all politicians born before 1916 are now dead. In contrast, no politicans born after 1963 has died so far. The oldest politician is George Pearce, who is almost 102:

deaths %>% 
  filter(is.na(age_at_death)) %>% 
  arrange(birth_year) %>% 
  filter(row_number()==1) %>% 
  mutate(age = interval(birth_final, today())/years(1)) %>% 
  select(displayName, age) 
## # A tibble: 1 x 2
##   displayName      age
##   <chr>          <dbl>
## 1 Pearce, George  102.

Average age at death by cohort

Let’s look at the average age of death of these politicians and compare it to the national average. I got the national data from the Australian Institute of Health and Welfare’s website. The indicator is \(e45+45\) for males, which is the expected age at death for those who lived at least until age 45. I didn’t want to compare to the usual life expectancy at birth, because we know that politicians already have to survive long enough to become politicians. Looking at the average age that people entered parliament, 45 is not too far off:

deaths %>%  
  summarise(mean(age_active, na.rm = T))
## # A tibble: 1 x 1
##   `mean(age_active, na.rm = T)`
##                           <dbl>
## 1                          45.2

I use males because there’s been hardly any women in parliament (:( ). Let’s read in the national data and calculate a year mid-point:

e45 <- read_csv("e45.csv")    
## Parsed with column specification:
## cols(
##   Year = col_character(),
##   e45 = col_double()
## )
e45 <- e45 %>% 
  mutate(start_year = as.numeric(str_sub(Year, 1,4)),
         end_year = as.numeric(str_sub(Year, 6,9)),
         year = floor((start_year+end_year)/2))
e45
## # A tibble: 37 x 5
##    Year        e45 start_year end_year  year
##    <chr>     <dbl>      <dbl>    <dbl> <dbl>
##  1 1881–1890  68         1881     1890  1885
##  2 1891–1900  69         1891     1900  1895
##  3 1901–1910  69.8       1901     1910  1905
##  4 1920–1922  71         1920     1922  1921
##  5 1932–1934  71.9       1932     1934  1933
##  6 1946–1948  71.8       1946     1948  1947
##  7 1953–1955  72.2       1953     1955  1954
##  8 1960–1962  72.4       1960     1962  1961
##  9 1965–1967  72         1965     1967  1966
## 10 1970–1972  72.1       1970     1972  1971
## # … with 27 more rows

Now graph the average age at death for politicians and the national data that we have. The size of the dot represents the number of people who died from that cohort.

deaths %>% 
  full_join(e45 %>% rename(birth_year = year)) %>% 
  filter(age_at_death>0) %>% 
  group_by(birth_year, e45) %>% 
  summarise(mean_age = mean(age_at_death), 
            deaths = n()) %>% 
  ggplot(aes(birth_year, mean_age)) + geom_point(aes(size = deaths)) +
  geom_point(aes(birth_year, e45, color = 'National average'), size = 4, pch = 10) +
  scale_color_manual(name = "", values = c("National average" = "red")) + 
  scale_size_continuous(name = "number of deaths") +
  ylab("average age at death (years)") + xlab("birth year") + 
  ggtitle("Average age at death of Australian politicians by birth year") + 
  theme_bw(base_size = 12) 

So the average age of death for politicians is generally well above the national average. There’s a steep drop in the later years, from about 1935 onwards, as these cohorts are still fairly young. The youngest ten are listed below, along with their reason for leaving parliament:

deaths %>% 
  filter(!is.na(age_at_death), birth_year>1935) %>% 
  arrange(age_at_death) %>% 
  mutate(reason_leaving = ifelse(is.na(senatorsEndReason), mpsEndReason, senatorsEndReason)) %>% 
  select(displayName, birth_year, age_at_death, reason_leaving) %>% 
  filter(row_number() %in% 1:10)
## # A tibble: 10 x 4
##    displayName    birth_year age_at_death reason_leaving
##    <chr>               <dbl>        <dbl> <chr>         
##  1 Knight, John         1943         37.3 Died          
##  2 Kirwan, Frank        1937         39.0 Defeated      
##  3 Gerick, Jane         1963         40.7 Defeated      
##  4 Wilton, Greg         1955         44.6 Died          
##  5 Bell, Robert         1950         51.1 Defeated      
##  6 West, Andrea         1952         57.6 Defeated      
##  7 Vigor, David         1939         58.8 Defeated      
##  8 Knott, Peter         1956         59.2 Defeated      
##  9 Young, Mick          1936         59.5 Resigned      
## 10 Haines, Janine       1945         59.5 Term Expired

Summary

If you’ve ever wanted to know about Australian Politicians, this is the package for you. These data could be combined with data from other sources, for example Twitter data to study more recent politicians, Hansard data, or data from other countries for international comparisons. This is also a great dataset to study a relatively privledged group of society.


  1. “Our nation was created with a vote, not a war”: h/t to this ad for teaching me the name of Australia’s first Prime Minister.

  2. Because why else is one in a relationship if not to have someone advertise your R package.