Data Pipeline: swepy.pipeline¶
-
class
swepy.pipeline.Swepy(working_dir=None, ul=None, lr=None, outfile19='all_days_19H.nc', outfile37='all_days_37H.nc', high_res=True)¶ Bases:
objectClass to facilitate the scraping/subsetting/concatenating of tB files for SWE analysis.
-
check_params()¶ Helper function to check that all the class members are set before attempting to web scrape or subset.
Used by test suite and to check params are set before scraping.
-
clean_dirs()¶ Delete files in directory Useful for cleaning up with repeated testing
-
concatenate(outname19=None, outname37=None, all=False)¶ Function to concatenate files in the subsetted data folders. Input parameter is simply to allow for nesting of functions.
- Parameters
outname19 (str) – output file name for 19Ghz
outname37 (str) – output file name for 37GHz
all (Boolean) –
-
convert_netcdf_zarr(outname19='zarr19', outname37='zarr37')¶ Convert netCDF files into zarr directories for storage in S3
- Parameters
outname19 (string (optional)) – name of the directory to store 19H file
outname37 (string (optional)) – name of the directory to store 37H file
- Returns
dict – dictionary with filename as key and the zarr object generated as the value
- Return type
{outname19:zarr_obj, outname37:zarr_obj}
-
final_concat()¶ Manage the final concatenation for scrape_all
-
get_directories(path)¶ Given a working directory, create data directories if non-existent
- Parameters
path (str) – working directory to create data directories
-
get_file(date, channel)¶ Function that uses date and channel to find optimal file composition and return the file params for the web scraper’s use. :param date: date to find file path for :type date: datetime :param channel: 19H vs 37H channel :type channel: str
-
get_grid(lat1, lat2)¶ Check which regions the lats fall into. Based on the grid, instantiate the ease grid conversion object.
- Parameters
lat1 (int) – Upper Left Latitude
lat2 (int) – Lower Right Latitude
-
get_sensor(date)¶ Helper function to return optimal sensor for a given date
- Parameters
date (datetime.date()) – date to find sensor information
-
get_xy(ll_ul, ll_lr)¶ Use nsidc scripts to convert user inputted lat/lon into Ease grid 2.0 coordinates
- Parameters
ll_ul ([float, float]) – Latitude and longitude for upper left coordinates
ll_lr ([float, float]) – Latitude and longitude for lower right coordinates
-
static
safe_subtract(tb19, tb37)¶ Check size of each file, often the 19 and 37 matrices are one unit off of eachother.
Chops the larger matrix to match the smaller matrix
-
scrape(dates=None)¶ Wrapper function to interface between swepy and nD
- Parameters
dates (List(datetime*)) – list of dates to scrape from
-
scrape_all()¶ Function to ensure we subset and concatenate every year! Implements the whole workflow!
-
set_dates(start=None, end=None)¶ Set date range using start and end datetime objects
- Parameters
start (datetime) – start date for scraping
end (datetime) – end date for scraping
-
set_grid(ul=None, lr=None)¶ Set grid corners, and convert to xy
- Parameters
ul (char or [float,float]) – upper left bounding coordinates or grid name (N,S,T)
lr ([float, float]) – lower right bounding coordinates (not needed for entire grid)
-
set_login(username=None, password=None)¶ Set login credentials and login to earth data
- Parameters
username (String) – Earthdata username
password (String) – Earthdata password
-
subset(scrape=False, in_dir=None, out_dir19=None, out_dir37=None)¶ Get the files from wget directory and subset them geographically based on coords from constructor
- Parameters
scrape (Boolean) – Under the hood variable to allow for auto workflow
in_dir (str) – (Optional) directory with input data stored in it. Default: “working_dir/data/wget”
out_dir19 (str) – (Optional) directory to store output 19GHz files Default: “working_dir/data/Subsetted_19H/”
out_dir37 (str) – (Optional) directory to store output 37GHz files Default: “working_dir/data/Subsetted_37H”
-