xagg package
Submodules
xagg.auxfuncs module
- xagg.auxfuncs.fix_ds(ds, var_cipher={'Lat': {'Lat': 'lat', 'Lon': 'lon'}, 'Latitude': {'Latitude': 'lat', 'Longitude': 'lon'}, 'Y': {'X': 'lon', 'Y': 'lat'}, 'latitude': {'latitude': 'lat', 'longitude': 'lon'}, 'latitude_1': {'latitude_1': 'lat', 'longitude_1': 'lon'}, 'nav_lat': {'nav_lat': 'lat', 'nav_lon': 'lon'}, 'y': {'x': 'lon', 'y': 'lat'}}, chg_bnds=True)[source]
Puts the input ds into a format compatible with the rest of the package
grid variables are renamed “lat” and “lon”
the lon dimension is made -180:180 to be consistent with most geographic data (as well as any lon_bnds variable if
chg_bnds=True(by default))the dataset is sorted in ascending order in both lat and lon
NOTE: there probably should be a safeguard in case “y” and “x” are multiindex dimension names instead of lat/lon names… maybe a warning for now… (TO DO)
- Parameters:
- ds
xarray.Dataset an input
xarray.Dataset, which may or may not need adjustment to be compatible with this package- var_cipherdict, optional
a dict of dicts for renaming lat/lon variables to “lat”/”lon”. The form is
{search_str:{lat_name:'lat',lon_name:'lon'},...}, the code looks for search_str in the dimensions of the ds; based on that, it renames lat_name to ‘lat’ and lon_name to ‘lon’. Common names for these variables (‘latitude’, ‘Latitude’, ‘Lat’, ‘latitude_1’,’nav_lat’,’Y’) are included out of the box.- chg_bndsbool, optional, default =
True if
True, the names of variables with “_bnd” in their names are assumed to be dimension bound variables, and are changed as well if the rest of their name matches ‘o’ (for lon) or ‘a’ (for lat). ## DOES THIS WORK FOR “X” and “Y”?
- ds
- Returns:
- ds
xarray.Dataset a dataset with lat/lon variables in the format necessary for this package to function
- ds
- xagg.auxfuncs.get_bnds(ds, wrap_around_thresh='dynamic', break_window_width=3, break_thresh_x=2, silent=None)[source]
Builds vectors of lat/lon bounds if not present in ds
Assumes a regular rectangular grid - so each lat/lon bound is 0.5*(gap between pixels) over to the next pixel.
- Parameters:
- ds
xarray.Dataset an xarray dataset that may or may not contain variables “lat_bnds” and “lon_bnds”
- silentbool, by default False
if True, then suppresses standard output
- wrap_around_threshnumeric or str, optional, default =
'dynamic' the minimum distance between the last grid cell in longitude and the ‘edges’ of the coordinate system for which the pixels are ‘wrapped around’. By default “dynamic”, which sets this to twice the median difference between longitude values.
In either case, if the lowest and highest lon values have opposite sign, and are both within wrap_around_thresh of -180 or 180, then the grid is assumed to be “wrapping around”, and pixels will line up around (or across, depending on the grid) the anti-meridian.
- break_thresh_xnumeric, default = 2
If the difference between consecutive coordinate values is break_thresh_x times the surrounding coordinate steps (determined by break_window_width), then a break in the grid is assumed and the corresponding grid cell is assumed to be as wide as the preceding one, instead of reaching all the way across the “break”.
- break_window_widthnumeric, default = 3
Window used to determine how anomalous a difference between coordinate is when determining the location of breaks in the grid.
- ds
- Returns:
- ds
xarray.Dataset the same dataset as inputted, unchanged if “lat/lon_bnds” already existed, or with new variables “lat_bnds” and “lon_bnds” if not.
- ds
- xagg.auxfuncs.normalize(a, drop_na=False)[source]
Normalizes the vector a
The vector a is divided by its sum. If all non-np.nan elements of a are 0, then a vector with only np.nan elements is returned.
- Parameters:
- aarray_like
A vector to be normalized
- drop_nabool, optional, default =
False If drop_na = True, and there are nans in the vector a, then the normalization is calculated using only the non-nan locations in a, and the vector is returned with the nans in their original location. In other words, np.nansum(normalize(a),drop_na=True) == 1.0
If drop_na = False, and nans are present in a, then normalize just returns a vector of np.nan the same length of a.
- Returns:
- avector
a, but normalized.
- xagg.auxfuncs.subset_find(ds0, ds1, silent=None)[source]
Finds the grid of ds1 in ds0, and subsets ds0 to the grid in ds1
- Parameters:
- ds0
xarray.Dataset an xarray Dataset to be subset based on the grid of ds1; must contain grid variables “lat” or “lon” (could add a fix_ds call)
- ds1
xarray.Dataset,xarray.DataArray either an xarray structrue (Dataset, DataArray) with “lat” “lon” variables, or a dictionary with DataArrays [‘lat’] and [‘lon’]. IMPORTANT: ds1 HAS TO BE BROADCAST - i.e. one value of lat, lon each coordinate, with lat and lon vectors of equal length. This can be done e.g. using
ds1.stack(loc=('lat','lon')).- silentbool, by default False
if True, then suppresses standard output
- ds0
- Returns:
- ds0
xarray.Dataset The input ds0, subset to the locations in ds1.
- ds0
xagg.classes module
- class xagg.classes.aggregated(agg, source_grid, geometry, ds_in, weights='nowghts')[source]
Bases:
objectClass for aggregated data, output from
xagg.core.aggregate()Methods
to_csv(fn[, silent])Save as csv
to_dataframe([loc_dim])Convert to pandas dataframe.
to_dataset([loc_dim])Convert to xarray dataset.
Convert to wide geopandas geodataframe.
to_netcdf(fn[, loc_dim, silent])Save as netcdf
to_shp(fn[, silent])Save as shapefile
- to_csv(fn, silent=None)[source]
Save as csv
- Parameters:
- fn
str The target filename
- silent
bool, by default False If True, silences status update
- fn
- to_dataframe(loc_dim='poly_idx')[source]
Convert to pandas dataframe.
- Parameters:
- loc_dim
str, by default ‘poly_idx’ What to name the polygon dimension (e.g., ‘county’)
- loc_dim
- to_dataset(loc_dim='poly_idx')[source]
Convert to xarray dataset.
- Parameters:
- loc_dim
str, by default ‘poly_idx’ What to name the polygon dimension (e.g., ‘county’)
- loc_dim
- class xagg.classes.weightmap(agg, source_grid, geometry, overlap_da=None, weights='nowghts')[source]
Bases:
objectClass for mapping from pixels to polgyons, output from
xagg.wrappers.pixel_overlaps()Methods
diag_fig(poly_id, ds[, fig, ax])Create a diagnostic figure showing overlap between pixels and a given polygon
to_file(fn[, overwrite])Save a copy of the weightmap, to avoid recalculating it
xagg.core module
- exception xagg.core.NoOverlapError[source]
Bases:
ExceptionException for when there’s no overlap between pixels and polygons
- xagg.core.aggregate(ds, wm, impl=None, silent=None)[source]
Aggregate raster variable(s) to polygon(s)
Aggregates (N-D) raster variables in ds to the polygons in gfd_out - in other words, gives the weighted average of the values in [ds] based on each pixel’s relative area overlap with the polygons.
The values will be additionally weighted if a weight was inputted into
xagg.core.create_raster_polygons()The code checks whether the input lat/lon grid in ds is equivalent to the linearly indexed grid in wm, or if it can be cropped to that grid.
- Parameters:
- ds
xarray.Dataset an
xarray.Datasetcontaining one or more variables with dimensions lat, lon (and possibly more). The dataset’s geographic grid has to include the lat/lon coordinates used in determining the pixel overlaps inxagg.core.get_pixel_overlaps()(and saved inwm['source_grid'])- wm
xagg.classes.weightmap the output to
xagg.core.get_pixel_overlaps(); axagg.classes.weightmapobject containing['agg']a dataframe, with one row per polygon, and the columns pix_idxs and rel_area, giving the linear indices and the relative area of each pixel over the polygon, respectively
['source_grid']the lat/lon grid on which the aggregating parameters were calculated (and on which the linear indices are based)
- impl:class:str (def:
'for_loop') (set byxa.set_options()) which aggregation calculation method to use, either of:
'for_loop'default behavior, aggregation loops through all polygons in a for loop, requires less memory
'dot_product'aggregation is calculated using a dot product, requires much more memory (due to broadcasting of variables) but may be faster in certain circumstances
'numba'aggregation is calculated using a simplified function that uses numba’s just-in-time compiler. Can be substantially faster
- silent
bool, default = False (set byxa.set_options()) if True, then no status updates are printed to std out
- ds
- Returns:
- agg_out
xagg.classes.aggregated an
xagg.classes.aggregatedobject with the aggregated variables
- agg_out
- xagg.core.create_raster_polygons(ds, mask=None, subset_bbox=None, weights=None, weights_target='ds', wrap_around_thresh=5, silent=None)[source]
Create polygons for each pixel in a raster
Note: ‘lat_bnds’ and ‘lon_bnds’ can be created through the
xagg.aux.get_bnds()function if they are not already included in the input raster file.Note: Currently this code only supports regular rectangular grids (so where every pixel side is a straight line in lat/lon space). Future versions may include support for irregular grids.
- Parameters:
- ds
xarray.Dataset an xarray dataset with the variables ‘lat_bnds’ and ‘lon_bnds’, which are both lat/lon x 2 arrays giving the min and max values of lat and lon for each pixel given by lat/lon
- subset_bbox
geopandas.GeoDataFrame, optional, default =None if a
geopandas.GeoDataFrameis entered, the bounding box around the geometries in the gdf are used to mask the grid, to reduce the number of pixel polygons created
- ds
- Returns:
- pix_agg:
dict a dictionary containing:
'gdf_pixels'a
geopandas.GeoDataFramecontaining a ‘geometry’ giving the pixel boundaries for each ‘lat’ / ‘lon’ pair
'source_grid'a dictionary containing the original lat and lon inputs under the keys “lat” and “lon” (just the
xarray.DataArrayof those variables in the input ds)
- pix_agg:
- xagg.core.get_pixel_overlaps(gdf_in, pix_agg, impl=None)[source]
Get, for each polygon, the pixels that overlap and their area of overlap
Finds, for each polygon in gdf_in, which pixels intersect it, and by how much.
Note: Uses EASE-Grid 2.0 on the WGS84 datum to calculate relative areas (see https://nsidc.org/data/ease)
- Parameters:
- gdf_in
geopandas.GeoDataFrane a
geopandas.GeoDataFramegiving the polygons over which the variables should be aggregated. Can be just a read shapefile (with the added column of “poly_idx”, which is just the index as a column).- pix_agg
dict the output of
xagg.core.create_raster_polygons(); a dict containing:'gdf_pixels'a
geopandas.GeoDataFramegiving for each row the columns “lat” and “lon” (with coordinates) and a polygon giving the boundary of the pixel given by lat/lon
'source_grid'[da.lat,da.lon]of the grid used to create the pixel polygons
- impl
str(set byxa.set_options()) whether the output will be used for the dot-product aggregation calculation (needs a slightly different format), either of: -
'for_loop'(default behavior) -'dot_product'(to set up forimpl='dot_product'inxagg.core.aggregate)
- gdf_in
- Returns:
- wm_out:
dict A dictionary containing:
'agg':a dataframe containing all the fields of
gdf_in(except geometry) and the additional columns:coords: the lat/lon coordiates of all pixels that overlap the polygon of that rowpix_idxs: the linear indices of those pixels within thegdf_pixelsgridrel_area: the relative area of each of the overlaps between the pixels and the polygon (summing to 1 - e.g. if the polygon is exactly the size and location of two pixels, their rel_areas would be 0.5 each)
'source_grid'a dictionary with keys ‘lat’ and ‘lon’ giving the original lat/lon grid whose overlaps with the polygons was calculated
'geometry'just the polygons from
gdf_in
- wm_out:
- xagg.core.make_multipoly(pts)[source]
Split pixel overlapping the antimeridian into MultiPolygon with each sub-Polygon in its own hemisphere
NB: to be used in
create_raster_polygons()
- xagg.core.process_weights(ds, weights=None, target='ds', silent=None)[source]
Process weights - including regridding
If
target == 'ds', regrid weights to ds. Iftarget == 'weights', regrid ds to weights.- Parameters:
- ds
xarray.Dataset,xarray.DataArray an
xarray.Dataset/xarray.DataArrayto regrid- weights
xarray.DataArray, optional, default =None an
xarray.DataArraycontaining a weight (numeric) at each location- target
str, optional, default ='ds' whether weights should be regridded to the ds grid (by default) or vice-versa (not yet supported, returns
NotImplementedError)- silent
bool, default = False (set byxa.set_options()) if True, then no status updates are printed to std out
- ds
- Returns:
- ds
xarray.Dataset,xarray.DataArrays the input
xarray.Dataset/xarray.DataArray, with a new variable weights specifying weights for each pixel- weights_info
dict a dictionary storing information about the weights regridding process, with the fields:
target: showing which of the two grids was retainedds_grid: a dictionary with the grid{"lat":ds.lat,"lon",ds.lon}weights_grid: a dictionary with the grid{"lat":weights.lat,"lon":weights.lon}
- ds
- xagg.core.read_wm(path)[source]
Load temporary weightmap files from
wm.to_file()Builds a weightmap out of saved weightmap component files. Particularly useful if the weightmap took a particularly long time to calculated (i.e., if the grid is particularly high resolution).
Assumes the files are generated from
wm.to_file(); i.e., the files are all in a directory name:- name/name.shp
the geometry of the input polygons
- name/name.agg
the dataframe with the pixel overlap data
- name/name_lat.nc, name/name_lon.nc
the source grid of the raster data
- name/name_weights.nc :
the additional weights grid, if used (this file is optional; if no file with this name is found, no weights are assumed, and wm.weights=’noweights’)
- Parameters:
- path
str The directory in which the files are stored. They are assumed to follow the filename convention of sharing the name of the directory (i.e., the last part of this path.)
- path
- Returns:
- wm
xagg.weightmap
- wm
xagg.diag module
- xagg.diag.diag_fig(wm, poly_id, pix_overlap_info, cmap='magma_r', max_title_depth=5, fig=None, ax=None)[source]
Create a diagnostic figure showing overlap between pixels and a given polygon
- Parameters:
- wm
xagg.classes.weightmap the output to
xagg.core.get_pixel_overlaps()- poly_id
int,list, ordict which polygon to plot. If int, then just the polygon in that indexed row of wm.agg is plotted. If list of int`s, then those polygons are plotted. If dict, then all matches in the geodataframe are plotted (e.g., `poly_id = {‘adm1_code’:’URY-8’}).
- pix_overlap_info
xarray.dataset.Dataset,xarray.dataarray.DataArray, or the output toxa.core.create_raster_polygons() If ds or da, the original dataset used to calculate the wm; needs to be re-entered here because wm does not keep the original pixel polygons. Otherwise, put in the output to
xa.core.create_raster_polygons(ds)()- cmapstr, by default ‘magma_r’
colormap, must be the name of a matplotlib-recognized colormap
- max_title_depthint, by default 5
if only showing one polygon, then the plot title is ‘, ‘.join(), called on the first max_title_depth columns of wm.agg that aren’t added by xagg
- figmpl.figure.Figure or None, by default None
if not None, then this figure handle is used
- axmpl.axis.Axis or None, by default None
if not None, then this axis handle is used
- wm
- Returns:
- fig,axthe matplotlib figure, axis handles
xagg.export module
- xagg.export.export_weightmap(wm_obj, fn, overwrite=False)[source]
Save a copy of the weightmap, to avoid recalculating it
- xagg.export.output_data(agg_obj, output_format, output_fn, loc_dim='poly_idx', silent=False)[source]
Wrapper for prep_for_* functions
- Parameters:
- agg_obj
xagg.classes.aggregated object to be exported
- output_format
str ‘netcdf’, ‘csv’, or ‘shp’
- output_fn: :py:class:`str`
the output filename
- loc_dim
str, optional. default ='poly_idx' the name of the dimension with location indices; used by
xagg.export.prep_for_nc()for nc, csv output
- agg_obj
- Returns:
- the variable that gets saved, so depending on the output_format:
“netcdf”: the
xarray.Dataseton which.to_netcdfwas called“csv”: the
pandas.Dataframeon which.to_csvwas called (uses the xarray ds.to_dataframe() functionality)“shp”: the
geopandas.GeoDataDrameon which.to_filewas called
- xagg.export.prep_for_csv(agg_obj, add_geom=False)[source]
Preps aggregated data for output as a csv
Concretely, aggregated data is placed in a new pandas dataframe and expanded wide - each aggregated variable is placed in new columns; one column per coordinate in each dimension that isn’t the location (poolygon). So, for example, a lat x lon x time variable “tas”, aggregated to location x time, would be reshaped long to columns “tas0”, “tas1”, “tas2”,… for timestep 0, 1, etc.
For data long, use
agg_obj.to_dataframe().to_csv()instead.Note: Currently no support for variables with more than one extra dimension beyond their location dimensions. Potential options: a multi-index column name, so [var]0-0, [var]0-1, etc…
- Parameters:
- agg_obj
xagg.classes.aggregated the output from
aggregate()- add_geombool, default False
if True, adds the geometry from the original shapefile/ geodataframe back into the prepped output; this is for the .to_geodataframe() conversion option.
- agg_obj
- Returns:
- df
pandas.DataFrame a pandas dataframe containing all the fields from the original location polygons + columns containing the values of the aggregated variables at each location. This can then easily be exported as a csv directly (using
df.to_csv()) or to shapefiles by first turning into a geodataframe.
- df
- xagg.export.prep_for_nc(agg_obj, loc_dim='poly_idx')[source]
Preps aggregated data for output as a netcdf
Concretely, aggregated data is placed in a new xarray dataset with dimensions of location (the different polygons in gdf_out) and any other dimension(s) in the original input raster data. All fields from the input polygons are kept as variables with dimension of location.
- Parameters:
- agg_obj
xagg.classes.aggregated - loc_dim
str, optional the name of the location dimension; by definition ‘poly_idx’. Values of that dimension are currently only an integer index (with further information given by the field variables). Future versions may allow, if loc_dim is set to the name of a field in the input polygons, to replace the dimension with the values of that field.
- agg_obj
xagg.wrappers module
- xagg.wrappers.pixel_overlaps(ds, gdf_in, weights=None, weights_target='ds', subset_bbox=True, impl=None, silent=None)[source]
Wrapper function for determining overlaps between grid and polygon
For a geodataframe gdf_in, takes an xarray structure ds (Dataset or DataArray) and for each polygon in gdf_in provides a list of pixels given by the ds grid which overlap that polygon, in addition to their relative area of overlap with the polygon.
The output is then ready to be fed into
xagg.core.aggregate(), which aggregates the variables in ds to the polygons in gdf_in using area- (and optionally other) weights.(NB: the wrapper uses
subset_bbox = Trueinxagg.core.create_raster_polygons())- Parameters:
- ds
xarray.Dataset,xarray.DataArray an xarray Dataset or DataArray containing at least grid variables (“lat”/”lon”, though several other names are supported; see docs for
xagg.aux.fix_ds()) and at least one variable on that grid- gdf_in
geopandas.GeoDataFrame a geopandas GeoDataFrame containing polygons (and any other fields, for example fields from shapefiles)
- weights
xarray.DataArrayorNone, optional, default =None (by default, None) if additional weights are desired, (for example, weighting pixels by population in addition to by area overlap), weights is an
xarray.DataArraycontaining that information. It does not have to be on the same grid as ds - grids will be homogonized (see below).- weights_target
str, optional if ‘ds’, then weights are regridded to the grid in [ds]; if ‘weights’, then the ds variables are regridded to the grid in ‘weights’ (LATTER NOT SUPPORTED YET, raises a NotImplementedError)
- impl
str, optional, default ='for_loop' (set by
xa.set_options()) whether to use the'for_loop'or'dot_product'methods for aggregating; the former uses less memory, the latter may be faster in certain circumstances- silent
bool, default = False (set byxa.set_options()) if True, then no status updates are printed to std out
- ds
- Returns:
- wm_out
dict the output of
xagg.core.get_pixel_overlaps()which gives the mapping of pixels to polygon aggregation; to be input intoxagg.core.aggregate().
- wm_out