Data Pipeline: swepy.pipeline¶
-
class
swepy.pipeline.
Swepy
(working_dir=None, ul=None, lr=None, outfile19='all_days_19H.nc', outfile37='all_days_37H.nc', high_res=True)¶ Bases:
object
Class to facilitate the scraping/subsetting/concatenating of tB files for SWE analysis.
-
check_params
()¶ Helper function to check that all the class members are set before attempting to web scrape or subset.
Used by test suite and to check params are set before scraping.
-
clean_dirs
()¶ Delete files in directory Useful for cleaning up with repeated testing
-
concatenate
(outname19=None, outname37=None, all=False)¶ Function to concatenate files in the subsetted data folders. Input parameter is simply to allow for nesting of functions.
- Parameters
outname19 (str) – output file name for 19Ghz
outname37 (str) – output file name for 37GHz
all (Boolean) –
-
convert_netcdf_zarr
(outname19='zarr19', outname37='zarr37')¶ Convert netCDF files into zarr directories for storage in S3
- Parameters
outname19 (string (optional)) – name of the directory to store 19H file
outname37 (string (optional)) – name of the directory to store 37H file
- Returns
dict – dictionary with filename as key and the zarr object generated as the value
- Return type
{outname19:zarr_obj, outname37:zarr_obj}
-
final_concat
()¶ Manage the final concatenation for scrape_all
-
get_directories
(path)¶ Given a working directory, create data directories if non-existent
- Parameters
path (str) – working directory to create data directories
-
get_file
(date, channel)¶ Function that uses date and channel to find optimal file composition and return the file params for the web scraper’s use. :param date: date to find file path for :type date: datetime :param channel: 19H vs 37H channel :type channel: str
-
get_grid
(lat1, lat2)¶ Check which regions the lats fall into. Based on the grid, instantiate the ease grid conversion object.
- Parameters
lat1 (int) – Upper Left Latitude
lat2 (int) – Lower Right Latitude
-
get_sensor
(date)¶ Helper function to return optimal sensor for a given date
- Parameters
date (datetime.date()) – date to find sensor information
-
get_xy
(ll_ul, ll_lr)¶ Use nsidc scripts to convert user inputted lat/lon into Ease grid 2.0 coordinates
- Parameters
ll_ul ([float, float]) – Latitude and longitude for upper left coordinates
ll_lr ([float, float]) – Latitude and longitude for lower right coordinates
-
static
safe_subtract
(tb19, tb37)¶ Check size of each file, often the 19 and 37 matrices are one unit off of eachother.
Chops the larger matrix to match the smaller matrix
-
scrape
(dates=None)¶ Wrapper function to interface between swepy and nD
- Parameters
dates (List(datetime*)) – list of dates to scrape from
-
scrape_all
()¶ Function to ensure we subset and concatenate every year! Implements the whole workflow!
-
set_dates
(start=None, end=None)¶ Set date range using start and end datetime objects
- Parameters
start (datetime) – start date for scraping
end (datetime) – end date for scraping
-
set_grid
(ul=None, lr=None)¶ Set grid corners, and convert to xy
- Parameters
ul (char or [float,float]) – upper left bounding coordinates or grid name (N,S,T)
lr ([float, float]) – lower right bounding coordinates (not needed for entire grid)
-
set_login
(username=None, password=None)¶ Set login credentials and login to earth data
- Parameters
username (String) – Earthdata username
password (String) – Earthdata password
-
subset
(scrape=False, in_dir=None, out_dir19=None, out_dir37=None)¶ Get the files from wget directory and subset them geographically based on coords from constructor
- Parameters
scrape (Boolean) – Under the hood variable to allow for auto workflow
in_dir (str) – (Optional) directory with input data stored in it. Default: “working_dir/data/wget”
out_dir19 (str) – (Optional) directory to store output 19GHz files Default: “working_dir/data/Subsetted_19H/”
out_dir37 (str) – (Optional) directory to store output 37GHz files Default: “working_dir/data/Subsetted_37H”
-