Data Pipeline: swepy.pipeline

class swepy.pipeline.Swepy(working_dir=None, ul=None, lr=None, outfile19='all_days_19H.nc', outfile37='all_days_37H.nc', high_res=True)

Bases: object

Class to facilitate the scraping/subsetting/concatenating of tB files for SWE analysis.

check_params()

Helper function to check that all the class members are set before attempting to web scrape or subset.

Used by test suite and to check params are set before scraping.

clean_dirs()

Delete files in directory Useful for cleaning up with repeated testing

concatenate(outname19=None, outname37=None, all=False)

Function to concatenate files in the subsetted data folders. Input parameter is simply to allow for nesting of functions.

Parameters
  • outname19 (str) – output file name for 19Ghz

  • outname37 (str) – output file name for 37GHz

  • all (Boolean) –

convert_netcdf_zarr(outname19='zarr19', outname37='zarr37')

Convert netCDF files into zarr directories for storage in S3

Parameters
  • outname19 (string (optional)) – name of the directory to store 19H file

  • outname37 (string (optional)) – name of the directory to store 37H file

Returns

dict – dictionary with filename as key and the zarr object generated as the value

Return type

{outname19:zarr_obj, outname37:zarr_obj}

final_concat()

Manage the final concatenation for scrape_all

get_directories(path)

Given a working directory, create data directories if non-existent

Parameters

path (str) – working directory to create data directories

get_file(date, channel)

Function that uses date and channel to find optimal file composition and return the file params for the web scraper’s use. :param date: date to find file path for :type date: datetime :param channel: 19H vs 37H channel :type channel: str

get_grid(lat1, lat2)

Check which regions the lats fall into. Based on the grid, instantiate the ease grid conversion object.

Parameters
  • lat1 (int) – Upper Left Latitude

  • lat2 (int) – Lower Right Latitude

get_sensor(date)

Helper function to return optimal sensor for a given date

Parameters

date (datetime.date()) – date to find sensor information

get_xy(ll_ul, ll_lr)

Use nsidc scripts to convert user inputted lat/lon into Ease grid 2.0 coordinates

Parameters
  • ll_ul ([float, float]) – Latitude and longitude for upper left coordinates

  • ll_lr ([float, float]) – Latitude and longitude for lower right coordinates

static safe_subtract(tb19, tb37)

Check size of each file, often the 19 and 37 matrices are one unit off of eachother.

Chops the larger matrix to match the smaller matrix

scrape(dates=None)

Wrapper function to interface between swepy and nD

Parameters

dates (List(datetime*)) – list of dates to scrape from

scrape_all()

Function to ensure we subset and concatenate every year! Implements the whole workflow!

set_dates(start=None, end=None)

Set date range using start and end datetime objects

Parameters
  • start (datetime) – start date for scraping

  • end (datetime) – end date for scraping

set_grid(ul=None, lr=None)

Set grid corners, and convert to xy

Parameters
  • ul (char or [float,float]) – upper left bounding coordinates or grid name (N,S,T)

  • lr ([float, float]) – lower right bounding coordinates (not needed for entire grid)

set_login(username=None, password=None)

Set login credentials and login to earth data

Parameters
  • username (String) – Earthdata username

  • password (String) – Earthdata password

subset(scrape=False, in_dir=None, out_dir19=None, out_dir37=None)

Get the files from wget directory and subset them geographically based on coords from constructor

Parameters
  • scrape (Boolean) – Under the hood variable to allow for auto workflow

  • in_dir (str) – (Optional) directory with input data stored in it. Default: “working_dir/data/wget”

  • out_dir19 (str) – (Optional) directory to store output 19GHz files Default: “working_dir/data/Subsetted_19H/”

  • out_dir37 (str) – (Optional) directory to store output 37GHz files Default: “working_dir/data/Subsetted_37H”