Data Preparation

United Kingdom

This UK data set probably came from this organization:

https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/lowersuperoutputareapopulationdensity

but I can no longer locate the exact file I used.

Read in the Excel worksheet with population densities for the UK:

Rows: 420
Columns: 10
$ Code                            <chr> "K02000001", "K03000001", "K04000001",…
$ Name                            <chr> "UNITED KINGDOM", "GREAT BRITAIN", "EN…
$ Geography                       <chr> "Country", "Country", "Country", "Coun…
$ `Area (sq km)`                  <dbl> 242740.8699, 228947.9193, 151046.9877,…
$ `Estimated Population mid-2021` <dbl> 67026292, 65121729, 59641829, 56536419…
$ `2021 people per sq. km`        <dbl> 276.12281, 284.43905, 394.85613, 433.8…
$ `Estimated Population mid-2011` <dbl> 63285145, 61470827, 56170927, 53107169…
$ `2011 people per sq. km`        <dbl> 260.71071, 268.49262, 371.87717, 407.5…
$ `Estimated Population mid-2001` <dbl> 59113016, 57424178, 52359978, 49449746…
$ `2001 people per sq. km`        <dbl> 243.52313, 250.81765, 346.64695, 379.4…

The details in the Excel file suggest that I should retain council areas, local government districts, London boroughs, unitary authority districts, counties, and metropolitan counties to cover the entire UK. In other words, the Country and Region categories are redundant to the others.

I will remove the Country and Region rows, and make a new data frame with only the names and 2021 population density figures, and a column with the value UK to designate the country.

Rows: 404
Columns: 3
$ Name    <chr> "County Durham", "Darlington", "Hartlepool", "Middlesbrough", …
$ density <dbl> 234.22216, 548.02082, 987.77170, 2667.58968, 64.05668, 557.457…
$ Country <chr> "UK", "UK", "UK", "UK", "UK", "UK", "UK", "UK", "UK", "UK", "U…

United States

I did not record the source of this data set of county populations and densities.

Read in the .csv file with the density data by County for the US.

Rows: 3,220
Columns: 23
$ OBJECTID                <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
$ COUNTYNS                <chr> "00161526", "00161527", "00161528", "00161529"…
$ GEOID                   <chr> "01001", "01003", "01005", "01007", "01009", "…
$ ALAND                   <dbl> 1539602123, 4117546676, 2292144655, 1612167481…
$ AWATER                  <dbl> 25706961, 1133055836, 50538698, 9602089, 15015…
$ NAME                    <chr> "Autauga County", "Baldwin County", "Barbour C…
$ State                   <chr> "Alabama", "Alabama", "Alabama", "Alabama", "A…
$ B25010_001E             <dbl> 2.59, 2.61, 2.49, 2.99, 2.77, 2.75, 2.94, 2.49…
$ B25010_001M             <dbl> 0.05, 0.04, 0.07, 0.14, 0.05, 0.15, 0.12, 0.04…
$ B25010_002E             <dbl> 2.59, 2.66, 2.44, 3.05, 2.85, 2.71, 2.89, 2.47…
$ B25010_002M             <dbl> 0.07, 0.06, 0.11, 0.18, 0.07, 0.20, 0.14, 0.04…
$ B25010_003E             <dbl> 2.60, 2.48, 2.58, 2.81, 2.48, 2.88, 3.04, 2.52…
$ B25010_003M             <dbl> 0.16, 0.11, 0.15, 0.38, 0.15, 0.41, 0.24, 0.10…
$ B01001_001E             <dbl> 55200, 208107, 25782, 22527, 57645, 10352, 200…
$ B01001_001M             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ B01001_calc_PopDensity  <dbl> 35.853419, 50.541504, 11.247981, 13.973114, 34…
$ created_user            <chr> "esri_demographics", "esri_demographics", "esr…
$ created_date            <chr> "2020/04/01 20:53:36+00", "2020/04/01 20:53:36…
$ last_edited_user        <chr> "esri_demographics", "esri_demographics", "esr…
$ last_edited_date        <chr> "2020/04/01 20:53:36+00", "2020/04/01 20:53:36…
$ B01001_calc_PopDensityM <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ SHAPE_Length            <dbl> 2.066037, 4.483746, 2.695262, 1.887514, 2.4235…
$ SHAPE_Area              <dbl> 0.1502559, 0.4099041, 0.2232702, 0.1564733, 0.…

From this data set I will retain all the rows except Puerto Rico, and columns for the Name, State, and population density. I will also create a Country column with the value “USA”.

These population densities are indeed per square kilometer, same as in the UK data set.

Rows: 3,142
Columns: 4
$ Name    <chr> "Autauga County", "Baldwin County", "Barbour County", "Bibb Co…
$ State   <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Alabam…
$ density <dbl> 35.853419, 50.541504, 11.247981, 13.973114, 34.515816, 6.41762…
$ Country <chr> "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",…

Combined Data

I can bind these two data sets together to compare densities by Country, among other things. The UK rows will have NA for the State column.

Summaries and Visualizations

By County or Other Jurisdiction

$UK
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
    8.719   192.068   529.719  1501.621  1992.998 15794.497 

$USA
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
    0.014     6.444    17.266   104.510    45.062 27819.805

The median density of the UK jurisdictions is 31 times the median of US counties.

What are the median counties in each country?

Least dense counties:

The Shetland Islands would be in the 47th percentile of US counties, very similar to the median US counties.

Density boxplots:

Density on Log10 scale:

“Pirate Plots” on Log10 scale:

These show dots for the raw data, bars for the medians, “beans” for the smoothed density, and rectangles (too small to see on these) for an inference interval around the medians.

Average Density by Country

Bournemouth, Christchurch, Poole

BCP would be in the 100th percentile of US counties, ranked 16th after Manassas Park, Virginia.

England and US Population Density

David Himrich

12 September 2024

Introduction