Take Home Exercise 3

Author

Zachary Wong

Published

February 14, 2024

Modified

April 7, 2024

The Task

In this take home exercise, the requirements are as of below:

  • Select a weather station and download historical daily temperature or rainfall data from Meteorological Service Singapore website,

  • Select either daily temperature or rainfall records of a month of the year 1983, 1993, 2003, 2013 and 2023 and create an analytics-driven data visualisation,

  • Apply appropriate interactive techniques to enhance the user experience in data discovery and/or visual story-telling.

Background

The Ministry of Sustainability and the Environment has released and infographic stating that the daily mean temperature in the coming years are projected to increase by 1.4 to 4.6, and the contrast between the wet months (November to January) and dry months (February and June to September) is likely to be more pronounced.

Source: Ministry of Sustainability and the Environment

The Data

According to the World Meteorological Organisation (WMO) guidelines, a climate station monitors the climate over a long-term period to provide data that enables detection of climate change signals at a national level. The location of such stations should be located relatively far away from large urban centers and store minimally 30 years of rainfall and temperature data.

In Singapore, the climate station has been shifted several times since its inception approximately 140 years ago. The Table below shows the previous locations of where the stations used to be and is currently.

Location of Singapore Climate Stations

For our study, the data will be obtained from the Meteorological Service Singapore’s website. According to the infographic earlier, it was mentioned that contrast between the wet and dry months would prove to be the most pronounced in seeing changes. Therefore, the month chosen would be December for the years in focus. Additionally, the rainfall data will be used for our analysis.

Downloading of Data

The following information was used to select and download the data from the Meteorological Service Singapore’s website:

  • Station selected: Serangoon

  • Month selected: December

  • Output Files (in CSV):

    File Name Description
    DAILYDATA_S36_198312 Weather Data from December 1983
    DAILYDATA_S36_199312 Weather Data from December 1983
    DAILYDATA_S36_200312 Weather Data from December 1983
    DAILYDATA_S36_201312 Weather Data from December 1983
    DAILYDATA_S36_202312 Weather Data from December 1983

Launching R Packages

As we move through the analysis, additional packages maybe installed and run at a later time. For exploration, the following libraries will be run first:

  • Tidyverse

  • Patchwork

  • Plotly

  • Crosstalk

  • DT

  • ggdist

  • ggridges

  • ggstatsplot

  • ggthemes

  • dplyr

  • plyr

  • readr

  • gganimate

  • ggiraph

Importing and Preparing the Data for Analysis

Based on the 5 files downloaded, some pre-preparation of data had to be done before being able to input into R due to a symbol in the documents. All the dashes (-) and degree (°) symbols in the document has been replaced with the value “0” and removed respectively as the read_csv function could not parse these symbols.

The columns will later be dropped from our study as there were no records for them:

  • Highest 30 Min Rainfall (mm)

  • Highest 60 Min Rainfall (mm)

  • Highest 120 Min Rainfall (mm)

  • Mean Temperature (°C)

  • Maximum Temperature (°C)

  • Minimum Temperature (°C)

  • Mean Wind Speed (km/h)

  • Max Wind Speed (km/h)

Reading the File

# A tibble: 31 × 13
   Station    Year Month   Day Daily Rainfall Total (mm…¹ Highest 30 Min Rainf…²
   <chr>     <dbl> <dbl> <dbl>                      <dbl>                  <dbl>
 1 Serangoon  1983    12     1                        6                        0
 2 Serangoon  1983    12     2                        1                        0
 3 Serangoon  1983    12     3                        3.6                      0
 4 Serangoon  1983    12     4                        0                        0
 5 Serangoon  1983    12     5                        0                        0
 6 Serangoon  1983    12     6                        0                        0
 7 Serangoon  1983    12     7                        0.3                      0
 8 Serangoon  1983    12     8                        4.1                      0
 9 Serangoon  1983    12     9                       56.1                      0
10 Serangoon  1983    12    10                       18.8                      0
# ℹ 21 more rows
# ℹ abbreviated names: ¹​`Daily Rainfall Total (mm)`,
#   ²​`Highest 30 Min Rainfall (mm)`
# ℹ 7 more variables: `Highest 60 Min Rainfall (mm)` <dbl>,
#   `Highest 120 Min Rainfall (mm)` <dbl>, `Mean Temperature (C)` <dbl>,
#   `Maximum Temperature (C)` <dbl>, `Minimum Temperature (C)` <dbl>,
#   `Mean Wind Speed (km/h)` <dbl>, `Max Wind Speed (km/h)` <dbl>
# A tibble: 31 × 13
   Station    Year Month   Day Daily Rainfall Total (mm…¹ Highest 30 Min Rainf…²
   <chr>     <dbl> <dbl> <dbl>                      <dbl>                  <dbl>
 1 Serangoon  1993    12     1                       50.5                      0
 2 Serangoon  1993    12     2                        4.2                      0
 3 Serangoon  1993    12     3                       28                        0
 4 Serangoon  1993    12     4                        1.2                      0
 5 Serangoon  1993    12     5                       33.8                      0
 6 Serangoon  1993    12     6                        8.5                      0
 7 Serangoon  1993    12     7                        0                        0
 8 Serangoon  1993    12     8                       20.9                      0
 9 Serangoon  1993    12     9                        0                        0
10 Serangoon  1993    12    10                       12.5                      0
# ℹ 21 more rows
# ℹ abbreviated names: ¹​`Daily Rainfall Total (mm)`,
#   ²​`Highest 30 Min Rainfall (mm)`
# ℹ 7 more variables: `Highest 60 Min Rainfall (mm)` <dbl>,
#   `Highest 120 Min Rainfall (mm)` <dbl>, `Mean Temperature (C)` <dbl>,
#   `Maximum Temperature (C)` <dbl>, `Minimum Temperature (C)` <dbl>,
#   `Mean Wind Speed (km/h)` <dbl>, `Max Wind Speed (km/h)` <dbl>
# A tibble: 31 × 13
   Station    Year Month   Day Daily Rainfall Total (mm…¹ Highest 30 Min Rainf…²
   <chr>     <dbl> <dbl> <dbl>                      <dbl>                  <dbl>
 1 Serangoon  2003    12     1                        0.6                      0
 2 Serangoon  2003    12     2                        6                        0
 3 Serangoon  2003    12     3                        0                        0
 4 Serangoon  2003    12     4                        0                        0
 5 Serangoon  2003    12     5                        5.5                      0
 6 Serangoon  2003    12     6                        0.4                      0
 7 Serangoon  2003    12     7                        0.2                      0
 8 Serangoon  2003    12     8                        0.2                      0
 9 Serangoon  2003    12     9                       24.3                      0
10 Serangoon  2003    12    10                       12.7                      0
# ℹ 21 more rows
# ℹ abbreviated names: ¹​`Daily Rainfall Total (mm)`,
#   ²​`Highest 30 Min Rainfall (mm)`
# ℹ 7 more variables: `Highest 60 Min Rainfall (mm)` <dbl>,
#   `Highest 120 Min Rainfall (mm)` <dbl>, `Mean Temperature (C)` <dbl>,
#   `Maximum Temperature (C)` <dbl>, `Minimum Temperature (C)` <dbl>,
#   `Mean Wind Speed (km/h)` <dbl>, `Max Wind Speed (km/h)` <dbl>
# A tibble: 31 × 13
   Station    Year Month   Day Daily Rainfall Total (mm…¹ Highest 30 Min Rainf…²
   <chr>     <dbl> <dbl> <dbl>                      <dbl>                  <dbl>
 1 Serangoon  2013    12     1                       20                        0
 2 Serangoon  2013    12     2                        8.6                      0
 3 Serangoon  2013    12     3                       92.9                      0
 4 Serangoon  2013    12     4                        0.2                      0
 5 Serangoon  2013    12     5                       23                        0
 6 Serangoon  2013    12     6                       59.2                      0
 7 Serangoon  2013    12     7                        0                        0
 8 Serangoon  2013    12     8                       25.2                      0
 9 Serangoon  2013    12     9                        0                        0
10 Serangoon  2013    12    10                        0                        0
# ℹ 21 more rows
# ℹ abbreviated names: ¹​`Daily Rainfall Total (mm)`,
#   ²​`Highest 30 Min Rainfall (mm)`
# ℹ 7 more variables: `Highest 60 Min Rainfall (mm)` <dbl>,
#   `Highest 120 Min Rainfall (mm)` <dbl>, `Mean Temperature (C)` <dbl>,
#   `Maximum Temperature (C)` <dbl>, `Minimum Temperature (C)` <dbl>,
#   `Mean Wind Speed (km/h)` <dbl>, `Max Wind Speed (km/h)` <dbl>
# A tibble: 31 × 13
   Station    Year Month   Day Daily Rainfall Total (mm…¹ Highest 30 min Rainf…²
   <chr>     <dbl> <dbl> <dbl>                      <dbl>                  <dbl>
 1 Serangoon  2023    12     1                       16.4                      3
 2 Serangoon  2023    12     2                        0                        0
 3 Serangoon  2023    12     3                        0                        0
 4 Serangoon  2023    12     4                        0                        0
 5 Serangoon  2023    12     5                        0                        0
 6 Serangoon  2023    12     6                        0                        0
 7 Serangoon  2023    12     7                        0                        0
 8 Serangoon  2023    12     8                        0                        0
 9 Serangoon  2023    12     9                        0                        0
10 Serangoon  2023    12    10                        0                        0
# ℹ 21 more rows
# ℹ abbreviated names: ¹​`Daily Rainfall Total (mm)`,
#   ²​`Highest 30 min Rainfall (mm)`
# ℹ 7 more variables: `Highest 60 min Rainfall (mm)` <dbl>,
#   `Highest 120 min Rainfall (mm)` <dbl>, `Mean Temperature (C)` <dbl>,
#   `Maximum Temperature (C)` <dbl>, `Minimum Temperature (C)` <dbl>,
#   `Mean Wind Speed (km/h)` <dbl>, `Max Wind Speed (km/h)` <dbl>

Combining the Files

# A tibble: 165 × 606
   Station    Year Month   Day Daily Rainfall Total (mm…¹ Highest 30 Min Rainf…²
   <chr>     <dbl> <dbl> <dbl>                      <dbl>                  <dbl>
 1 Serangoon  1983    12     1                        6                        0
 2 Serangoon  1983    12     2                        1                        0
 3 Serangoon  1983    12     3                        3.6                      0
 4 Serangoon  1983    12     4                        0                        0
 5 Serangoon  1983    12     5                        0                        0
 6 Serangoon  1983    12     6                        0                        0
 7 Serangoon  1983    12     7                        0.3                      0
 8 Serangoon  1983    12     8                        4.1                      0
 9 Serangoon  1983    12     9                       56.1                      0
10 Serangoon  1983    12    10                       18.8                      0
# ℹ 155 more rows
# ℹ abbreviated names: ¹​`Daily Rainfall Total (mm)`,
#   ²​`Highest 30 Min Rainfall (mm)`
# ℹ 600 more variables: `Highest 60 Min Rainfall (mm)` <dbl>,
#   `Highest 120 Min Rainfall (mm)` <dbl>, `Mean Temperature (C)` <dbl>,
#   `Maximum Temperature (C)` <dbl>, `Minimum Temperature (C)` <dbl>,
#   `Mean Wind Speed (km/h)` <dbl>, `Max Wind Speed (km/h)` <dbl>, …

Selecting the Columns to Keep

tibble [165 × 3] (S3: tbl_df/tbl/data.frame)
 $ Year                     : Factor w/ 5 levels "1983","1993",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Day                      : num [1:165] 1 2 3 4 5 6 7 8 9 10 ...
 $ Daily Rainfall Total (mm): num [1:165] 6 1 3.6 0 0 0 0.3 4.1 56.1 18.8 ...

Next, we will inspect the new data table for any duplicates and checking that the columns have been correctly selected.

# A tibble: 9 × 3
  Year    Day `Daily Rainfall Total (mm)`
  <fct> <dbl>                       <dbl>
1 <NA>     NA                          NA
2 <NA>     NA                          NA
3 <NA>     NA                          NA
4 <NA>     NA                          NA
5 <NA>     NA                          NA
6 <NA>     NA                          NA
7 <NA>     NA                          NA
8 <NA>     NA                          NA
9 <NA>     NA                          NA

Visualizing the Data

Since we inspected and confirmed that the data has no duplicate and all values are provided, we will explore the data that we have through various visualizations to understand the data better.

Daily Rainfall in December By Year

Based on the charts above, the frequency of rainfall above the total average across 5 years seem to be increasing with the exception of year 2023. For year 2023, it is significantly different from the previous selected years as there was only 1 incident of rainfall throughout the entire month.

As the year is an outlier, we will not take into account 2023’s year into our comparison and analysis.

Another observation from the above charts is that even though the frequency of rainfall above the total average is increasing, the mean of rainfall across each individual year do not seem to be very much different.

The above chart is a representation of the earlier chart with the outliers removed to have a clearer look at the average of rainfall for each year. Even though there was an increase in frequency of higher than average rainfall, the year-by-year comparison does not show any significant increase as what the infographic mentions.

However, according to the bar chart below, there was an increase in total rainfall for the month of December across the years.

  Group.1     x
1    1983 288.9
2    1993 326.9
3    2003 301.1
4    2013 516.0
5    2023  16.4
  Group.1          x
1    1983  9.3193548
2    1993 10.5451613
3    2003  9.7129032
4    2013 16.6451613
5    2023  0.5290323
'data.frame':   5 obs. of  3 variables:
 $ Year : num  1983 1993 2003 2013 2023
 $ Mean : num  9.32 10.55 9.71 16.65 0.53
 $ Total: num  288.9 326.9 301.1 516 16.4

Interactive Visualization

Based on the first chart, the next 2 charts will be an interactive version of it. One to show the changes across time and another that can provide selection of information in the hopes of enhancing the analysis.

Interactive Chart 1

Warning

The interactive chart below is meant to show various rainfall across the years, however due to an error in R, the column “Year” (factor) is currently not being recognized correctly. This has proved to be of much hindrance for this study and interactivity.

Interactive Chart 2

Additional Interactivity and Other EDA of Data

Back to top