Skip to contents

Import gridded "Surface UV" data released by EUMETSAT AC SAF (Atmospheric Composition Monitoring) project from HDF5 files downloaded from the FMI server.

Usage

sUV_read_OUV_hdf5(
  files,
  data.product = NULL,
  group.name = "GRID_PRODUCT",
  vars.to.read = NULL,
  fill = NA_real_,
  keep.QC = TRUE,
  verbose = interactive()
)

sUV_vars_OUV_hdf5(
  files,
  data.product = NULL,
  group.name = "GRID_PRODUCT",
  keep.QC = TRUE,
  set.oper = "intersect"
)

sUV_grid_OUV_hdf5(files, expand = FALSE)

sUV_date_OUV_hdf5(files, use.names = length(files > 1))

Arguments

files

character A vector of file names, no other limitation in length than available memory to hold the data.

data.product

character Currently only "Surface UV" supported.

group.name

character The name of the 'group' in the HDF5 files, or a regular expression for matching a single group name with grep().

vars.to.read

character A vector of variable names. If NULL all the variables present in the first file are read.

fill

numeric The R value used to replace the fill value used in the file, which is retrieved from the file metadata, and also used to fill missing variables.

keep.QC

logical Add to the returned data frame or vector the quality control variable, always present in the files.

verbose

logical Flag indicating if progress, and time and size of the returned object should be printed.

set.oper

character One of "intersect", or "union".

expand

logical Flag indicating whether to return ranges or a full grid.

use.names

logical. Should names be added to the returned vector?

Value

Function sUV_read_OUV_hdf5() returns a data frame with columns named "Date", "Longitude", "Latitude", the data variables with their original names, and "QualityFlags". The data variables have their metadata stored as R attributes. sUV_vars_OUV_hdf5() returns a character vector of variable names, sUV_grid_OUV_hdf5() returns a data frame with two numeric variables, Longitude and Latitude, with two rows or an expanded grid depending on the argument passed to expand, while sUV_date_OUV_hdf5() returns a named vector of class Date, with file names as names.

Details

Function sUV_read_OUV_hdf5() can be used to read the data stored in a file, either in full or selected variables. Query functions sUV_vars_OUV_hdf5(), sUV_grid_OUV_hdf5() and sUV_date_OUV_hdf5() extract the names of the variables, the range of the grid and the dates of measurements much more efficiently than by using sUV_read_OUV_hdf5(). The dates are decoded from the file names, expecting these to be those set by the data provider. The grid is expected to be identical in all files that are imported in a call to sUV_read_OUV_hdf5(), and grid subsetting is currently not supported. If not all the files named in the argument to files are accessible, an error is triggered early. If the files differ in the grid, an error is triggered when reading the first mismatching file. Missing variables named in vars.to.read if detected when reading the first file, are filled with the fill value, otherwise they trigger an error when an attempt is made to read them.

Note

The constraint on the consistency among all files to be read allows very fast reading into a single data frame. If the files differ in the grid or set of variables, this function can be used to read the files individually into separate data frames. These data frames can later be row-bound together.

Variable QualityFlags is encoded as 64 bit integers in the HDF5 file and read as a double. R package 'bit64' can be used to print these values as 64 bit integers.

When requesting the data from the EUMETSAT AC SAF FMI server at https://acsaf.org/ it is possible to select the range of latitudes and longitudes and the variables to be included in the file. This is more efficient than doing the selection after importing the data into R. The data are returned as a .zip compressed file containing one .HDF5 file for each day in the range of dates selected. For world coverage each of these files can be as large as 10 MB in size depending on how many variables they contain. These files in HDF5 format are binary files so the size in RAM of a data.frame object containing one-year of data can be a few 10's of GB.

This function's performance is fast as long as there is enough RAM available to hold the data frame and the files are read from a reasonably fast SSD. The example data included in the package are only for Spain and five summer days. They are used in examples and automated tests. Function sUV_read_OUV_hdf5() has been also tested by importing one-year's worth of data with worldwide coverage on a PC with 64GB RAM.

References

Kujanpää, J. (2019) PRODUCT USER MANUAL Offline UV Products v2 (IDs: O3M-450 - O3M-464) and Data Record R1 (IDs: O3M-138 - O3M-152). Ref. SAF/AC/FMI/PUM/001. 18 pp. EUMETSAT AC SAF.

See also

sUV_read_OUV_txt() supporting the same Surface UV data stored in text files as single-location time series.

Examples

# find location of one example file
one.file.name <-
   system.file("extdata", "O3MOUV_L3_20240621_v02p02.HDF5",
               package = "surfaceuv", mustWork = TRUE)

# available variables
sUV_vars_OUV_hdf5(one.file.name)
#> [1] "Date"                "Longitude"           "Latitude"           
#> [4] "DailyDoseUva"        "DailyDoseUvb"        "DailyMaxDoseRateUva"
#> [7] "DailyMaxDoseRateUvb" "QualityFlags"       

# available grid
sUV_grid_OUV_hdf5(one.file.name)
#>   Longitude Latitude
#> 1    -10.75    35.25
#> 2     -4.75    43.25

# decode date from file name
sUV_date_OUV_hdf5(one.file.name)
#> O3MOUV_L3_20240621_v02p02.HDF5 
#>                   "2024-06-21" 
sUV_date_OUV_hdf5(one.file.name, use.names = FALSE)
#> [1] "2024-06-21"

# read all variables
midsummer_spain.tb <- sUV_read_OUV_hdf5(one.file.name)
dim(midsummer_spain.tb)
#> [1] 221   8
summary(midsummer_spain.tb)
#>       Date              Longitude         Latitude      DailyDoseUva   
#>  Min.   :2024-06-21   Min.   :-10.75   Min.   :35.25   Min.   : 836.7  
#>  1st Qu.:2024-06-21   1st Qu.: -9.25   1st Qu.:37.25   1st Qu.:1496.4  
#>  Median :2024-06-21   Median : -7.75   Median :39.25   Median :1660.4  
#>  Mean   :2024-06-21   Mean   : -7.75   Mean   :39.25   Mean   :1585.9  
#>  3rd Qu.:2024-06-21   3rd Qu.: -6.25   3rd Qu.:41.25   3rd Qu.:1743.2  
#>  Max.   :2024-06-21   Max.   : -4.75   Max.   :43.25   Max.   :1798.3  
#>   DailyDoseUvb   DailyMaxDoseRateUva DailyMaxDoseRateUvb  QualityFlags
#>  Min.   :18.78   Min.   :28972       Min.   : 807.6      Min.   :0    
#>  1st Qu.:34.13   1st Qu.:51486       1st Qu.:1456.1      1st Qu.:0    
#>  Median :38.32   Median :56515       Median :1642.2      Median :0    
#>  Mean   :36.40   Mean   :53981       Mean   :1564.3      Mean   :0    
#>  3rd Qu.:40.44   3rd Qu.:58781       3rd Qu.:1719.5      3rd Qu.:0    
#>  Max.   :43.69   Max.   :60026       Max.   :1887.8      Max.   :0    

# read two variables
midsummer_spain_daily.tb <-
  sUV_read_OUV_hdf5(one.file.name,
                    vars.to.read = c("DailyDoseUva", "DailyDoseUvb"))
dim(midsummer_spain_daily.tb)
#> [1] 221   5
summary(midsummer_spain_daily.tb)
#>       Date              Longitude         Latitude      DailyDoseUva   
#>  Min.   :2024-06-21   Min.   :-10.75   Min.   :35.25   Min.   : 836.7  
#>  1st Qu.:2024-06-21   1st Qu.: -9.25   1st Qu.:37.25   1st Qu.:1496.4  
#>  Median :2024-06-21   Median : -7.75   Median :39.25   Median :1660.4  
#>  Mean   :2024-06-21   Mean   : -7.75   Mean   :39.25   Mean   :1585.9  
#>  3rd Qu.:2024-06-21   3rd Qu.: -6.25   3rd Qu.:41.25   3rd Qu.:1743.2  
#>  Max.   :2024-06-21   Max.   : -4.75   Max.   :43.25   Max.   :1798.3  
#>   DailyDoseUvb  
#>  Min.   :18.78  
#>  1st Qu.:34.13  
#>  Median :38.32  
#>  Mean   :36.40  
#>  3rd Qu.:40.44  
#>  Max.   :43.69  

# find location of three example files
three.file.names <-
   system.file("extdata",
               c("O3MOUV_L3_20240621_v02p02.HDF5",
                 "O3MOUV_L3_20240622_v02p02.HDF5",
                 "O3MOUV_L3_20240623_v02p02.HDF5"),
               package = "surfaceuv", mustWork = TRUE)

sUV_date_OUV_hdf5(three.file.names)
#> O3MOUV_L3_20240621_v02p02.HDF5 O3MOUV_L3_20240622_v02p02.HDF5 
#>                   "2024-06-21"                   "2024-06-22" 
#> O3MOUV_L3_20240623_v02p02.HDF5 
#>                   "2024-06-23" 

summer_3days_spain.tb <- sUV_read_OUV_hdf5(three.file.names)
dim(summer_3days_spain.tb)
#> [1] 663   8
summary(summer_3days_spain.tb)
#>       Date              Longitude         Latitude      DailyDoseUva   
#>  Min.   :2024-06-21   Min.   :-10.75   Min.   :35.25   Min.   : 275.6  
#>  1st Qu.:2024-06-21   1st Qu.: -9.25   1st Qu.:37.25   1st Qu.:1487.7  
#>  Median :2024-06-22   Median : -7.75   Median :39.25   Median :1657.2  
#>  Mean   :2024-06-22   Mean   : -7.75   Mean   :39.25   Mean   :1578.1  
#>  3rd Qu.:2024-06-23   3rd Qu.: -6.25   3rd Qu.:41.25   3rd Qu.:1732.6  
#>  Max.   :2024-06-23   Max.   : -4.75   Max.   :43.25   Max.   :1799.5  
#>   DailyDoseUvb    DailyMaxDoseRateUva DailyMaxDoseRateUvb  QualityFlags
#>  Min.   : 7.162   Min.   : 9891       Min.   : 314.2      Min.   :0    
#>  1st Qu.:34.555   1st Qu.:51749       1st Qu.:1481.9      1st Qu.:0    
#>  Median :37.511   Median :56457       Median :1617.6      Median :0    
#>  Mean   :36.186   Mean   :53711       Mean   :1554.9      Mean   :0    
#>  3rd Qu.:39.713   3rd Qu.:58218       3rd Qu.:1697.4      3rd Qu.:0    
#>  Max.   :43.690   Max.   :60026       Max.   :1887.8      Max.   :0