xagg package
Submodules
xagg.auxfuncs module
- xagg.auxfuncs.fix_ds(ds, var_cipher={'Lat': {'Lat': 'lat', 'Lon': 'lon'}, 'Latitude': {'Latitude': 'lat', 'Longitude': 'lon'}, 'Y': {'X': 'lon', 'Y': 'lat'}, 'latitude': {'latitude': 'lat', 'longitude': 'lon'}, 'latitude_1': {'latitude_1': 'lat', 'longitude_1': 'lon'}, 'nav_lat': {'nav_lat': 'lat', 'nav_lon': 'lon'}, 'y': {'x': 'lon', 'y': 'lat'}}, chg_bnds=True)[source]
Puts the input ds into a format compatible with the rest of the package
grid variables are renamed “lat” and “lon”
the lon dimension is made -180:180 to be consistent with most geographic data (as well as any lon_bnds variable if
chg_bnds=True
(by default))the dataset is sorted in ascending order in both lat and lon
NOTE: there probably should be a safeguard in case “y” and “x” are multiindex dimension names instead of lat/lon names… maybe a warning for now… (TO DO)
- Parameters:
- ds
xarray.Dataset
an input
xarray.Dataset
, which may or may not need adjustment to be compatible with this package- var_cipherdict, optional
a dict of dicts for renaming lat/lon variables to “lat”/”lon”. The form is
{search_str:{lat_name:'lat',lon_name:'lon'},...}
, the code looks for search_str in the dimensions of the ds; based on that, it renames lat_name to ‘lat’ and lon_name to ‘lon’. Common names for these variables (‘latitude’, ‘Latitude’, ‘Lat’, ‘latitude_1’,’nav_lat’,’Y’) are included out of the box.- chg_bndsbool, optional, default =
True
if
True
, the names of variables with “_bnd” in their names are assumed to be dimension bound variables, and are changed as well if the rest of their name matches ‘o’ (for lon) or ‘a’ (for lat). ## DOES THIS WORK FOR “X” and “Y”?
- ds
- Returns:
- ds
xarray.Dataset
a dataset with lat/lon variables in the format necessary for this package to function
- ds
- xagg.auxfuncs.get_bnds(ds, wrap_around_thresh='dynamic', break_window_width=3, break_thresh_x=2, silent=False)[source]
Builds vectors of lat/lon bounds if not present in ds
Assumes a regular rectangular grid - so each lat/lon bound is 0.5*(gap between pixels) over to the next pixel.
- Parameters:
- ds
xarray.Dataset
an xarray dataset that may or may not contain variables “lat_bnds” and “lon_bnds”
- silentbool, by default False
if True, then suppresses standard output
- wrap_around_threshnumeric or str, optional, default =
'dynamic'
the minimum distance between the last grid cell in longitude and the ‘edges’ of the coordinate system for which the pixels are ‘wrapped around’. By default “dynamic”, which sets this to twice the median difference between longitude values.
In either case, if the lowest and highest lon values have opposite sign, and are both within wrap_around_thresh of -180 or 180, then the grid is assumed to be “wrapping around”, and pixels will line up around (or across, depending on the grid) the anti-meridian.
- break_thresh_xnumeric, default = 2
If the difference between consecutive coordinate values is break_thresh_x times the surrounding coordinate steps (determined by break_window_width), then a break in the grid is assumed and the corresponding grid cell is assumed to be as wide as the preceding one, instead of reaching all the way across the “break”.
- break_window_widthnumeric, default = 3
Window used to determine how anomalous a difference between coordinate is when determining the location of breaks in the grid.
- ds
- Returns:
- ds
xarray.Dataset
the same dataset as inputted, unchanged if “lat/lon_bnds” already existed, or with new variables “lat_bnds” and “lon_bnds” if not.
- ds
- xagg.auxfuncs.normalize(a, drop_na=False)[source]
Normalizes the vector a
The vector a is divided by its sum. If all non-np.nan elements of a are 0, then a vector with only np.nan elements is returned.
- Parameters:
- aarray_like
A vector to be normalized
- drop_nabool, optional, default =
False
If drop_na = True, and there are nans in the vector a, then the normalization is calculated using only the non-nan locations in a, and the vector is returned with the nans in their original location. In other words, np.nansum(normalize(a),drop_na=True) == 1.0
If drop_na = False, and nans are present in a, then normalize just returns a vector of np.nan the same length of a.
- Returns:
- avector
a, but normalized.
- xagg.auxfuncs.subset_find(ds0, ds1, silent=False)[source]
Finds the grid of ds1 in ds0, and subsets ds0 to the grid in ds1
- Parameters:
- ds0
xarray.Dataset
an xarray Dataset to be subset based on the grid of ds1; must contain grid variables “lat” or “lon” (could add a fix_ds call)
- ds1
xarray.Dataset
,xarray.DataArray
either an xarray structrue (Dataset, DataArray) with “lat” “lon” variables, or a dictionary with DataArrays [‘lat’] and [‘lon’]. IMPORTANT: ds1 HAS TO BE BROADCAST - i.e. one value of lat, lon each coordinate, with lat and lon vectors of equal length. This can be done e.g. using
ds1.stack(loc=('lat','lon'))
.- silentbool, by default False
if True, then suppresses standard output
- ds0
- Returns:
- ds0
xarray.Dataset
The input ds0, subset to the locations in ds1.
- ds0
xagg.classes module
- class xagg.classes.aggregated(agg, source_grid, geometry, ds_in, weights='nowghts')[source]
Bases:
object
Class for aggregated data, output from
xagg.core.aggregate()
Methods
to_csv
(fn[, silent])Save as csv
to_dataframe
([loc_dim])Convert to pandas dataframe.
to_dataset
([loc_dim])Convert to xarray dataset.
Convert to geopandas geodataframe.
to_netcdf
(fn[, loc_dim, silent])Save as netcdf
to_shp
(fn[, silent])Save as shapefile
- to_csv(fn, silent=False)[source]
Save as csv
- Parameters:
- fnstr
The target filename
- silentbool, by default False
If True, silences standard out
- class xagg.classes.weightmap(agg, source_grid, geometry, overlap_da=None, weights='nowghts')[source]
Bases:
object
Class for mapping from pixels to polgyons, output from
xagg.wrappers.pixel_overlaps()
Methods
diag_fig
(poly_id, ds)Create a diagnostic figure showing overlap between pixels and a given polygon
to_file
(fn[, overwrite])Save a copy of the weightmap, to avoid recalculating it
xagg.core module
- exception xagg.core.NoOverlapError[source]
Bases:
Exception
Exception for when there’s no overlap between pixels and polygons
- xagg.core.aggregate(ds, wm, impl='for_loop', silent=False)[source]
Aggregate raster variable(s) to polygon(s)
Aggregates (N-D) raster variables in ds to the polygons in gfd_out - in other words, gives the weighted average of the values in [ds] based on each pixel’s relative area overlap with the polygons.
The values will be additionally weighted if a weight was inputted into
xagg.core.create_raster_polygons()
The code checks whether the input lat/lon grid in ds is equivalent to the linearly indexed grid in wm, or if it can be cropped to that grid.
- Parameters:
- ds
xarray.Dataset
an
xarray.Dataset
containing one or more variables with dimensions lat, lon (and possibly more). The dataset’s geographic grid has to include the lat/lon coordinates used in determining the pixel overlaps inxagg.core.get_pixel_overlaps()
(and saved inwm['source_grid']
)- wm
xagg.classes.weightmap
the output to
xagg.core.get_pixel_overlaps()
; axagg.classes.weightmap
object containing['agg']
a dataframe, with one row per polygon, and the columns pix_idxs and rel_area, giving the linear indices and the relative area of each pixel over the polygon, respectively
['source_grid']
the lat/lon grid on which the aggregating parameters were calculated (and on which the linear indices are based)
- impl:class:str (def:
'for_loop'
) which aggregation calculation method to use, either of:
'for_loop'
default behavior, aggregation loops through all polygons in a for loop, requires less memory
'dot_product'
aggregation is calculated using a dot product, requires much more memory (due to broadcasting of variables) but may be faster in certain circumstances
- silent
bool
, default = False if True, then no status updates are printed to std out
- ds
- Returns:
- agg_out
xagg.classes.aggregated
an
xagg.classes.aggregated
object with the aggregated variables
- agg_out
- xagg.core.create_raster_polygons(ds, mask=None, subset_bbox=None, weights=None, weights_target='ds', wrap_around_thresh=5, silent=False)[source]
Create polygons for each pixel in a raster
Note: ‘lat_bnds’ and ‘lon_bnds’ can be created through the
xagg.aux.get_bnds()
function if they are not already included in the input raster file.Note: Currently this code only supports regular rectangular grids (so where every pixel side is a straight line in lat/lon space). Future versions may include support for irregular grids.
- Parameters:
- ds
xarray.Dataset
an xarray dataset with the variables ‘lat_bnds’ and ‘lon_bnds’, which are both lat/lon x 2 arrays giving the min and max values of lat and lon for each pixel given by lat/lon
- subset_bbox
geopandas.GeoDataFrame
, optional, default =None
if a
geopandas.GeoDataFrame
is entered, the bounding box around the geometries in the gdf are used to mask the grid, to reduce the number of pixel polygons created
- ds
- Returns:
- pix_agg:
dict
a dictionary containing:
'gdf_pixels'
a
geopandas.GeoDataFrame
containing a ‘geometry’ giving the pixel boundaries for each ‘lat’ / ‘lon’ pair
'source_grid'
a dictionary containing the original lat and lon inputs under the keys “lat” and “lon” (just the
xarray.DataArray
of those variables in the input ds)
- pix_agg:
- xagg.core.get_pixel_overlaps(gdf_in, pix_agg, impl='for_loop')[source]
Get, for each polygon, the pixels that overlap and their area of overlap
Finds, for each polygon in gdf_in, which pixels intersect it, and by how much.
Note: Uses EASE-Grid 2.0 on the WGS84 datum to calculate relative areas (see https://nsidc.org/data/ease)
- Parameters:
- gdf_in
geopandas.GeoDataFrane
a
geopandas.GeoDataFrame
giving the polygons over which the variables should be aggregated. Can be just a read shapefile (with the added column of “poly_idx”, which is just the index as a column).- pix_agg
dict
the output of
xagg.core.create_raster_polygons()
; a dict containing:'gdf_pixels'
a
geopandas.GeoDataFrame
giving for each row the columns “lat” and “lon” (with coordinates) and a polygon giving the boundary of the pixel given by lat/lon
'source_grid'
[da.lat,da.lon]
of the grid used to create the pixel polygons
- impl
str
whether the output will be used for the dot-product aggregation calculation (needs a slightly different format), either of: -
'for_loop'
(default behavior) -'dot_product'
(to set up forimpl='dot_product'
inxagg.core.aggregate
)
- gdf_in
- Returns:
- wm_out:
dict
A dictionary containing:
'agg'
:a dataframe containing all the fields of
gdf_in
(except geometry) and the additional columns:coords
: the lat/lon coordiates of all pixels that overlap
the polygon of that row
pix_idxs
: the linear indices of those pixels within the
gdf_pixels
gridrel_area
: the relative area of each of the overlaps between
the pixels and the polygon (summing to 1 - e.g. if the polygon is exactly the size and location of two pixels, their rel_areas would be 0.5 each)
'source_grid'
a dictionary with keys ‘lat’ and ‘lon’ giving the original lat/lon grid whose overlaps with the polygons was calculated
'geometry'
just the polygons from
gdf_in
- wm_out:
- xagg.core.make_multipoly(pts)[source]
Split pixel overlapping the antimeridian into MultiPolygon with each sub-Polygon in its own hemisphere
NB: to be used in
create_raster_polygons()
- xagg.core.process_weights(ds, weights=None, target='ds', silent=False)[source]
Process weights - including regridding
If
target == 'ds'
, regrid weights to ds. Iftarget == 'weights'
, regrid ds to weights.- Parameters:
- ds
xarray.Dataset
,xarray.DataArray
an
xarray.Dataset
/xarray.DataArray
to regrid- weights
xarray.DataArray
, optional, default =None
an
xarray.DataArray
containing a weight (numeric) at each location- target
str
, optional, default ='ds'
whether weights should be regridded to the ds grid (by default) or vice-versa (not yet supported, returns
NotImplementedError
)- silent
bool
, default = False if True, then no status updates are printed to std out
- ds
- Returns:
- ds
xarray.Dataset
,xarray.DataArrays
the input
xarray.Dataset
/xarray.DataArray
, with a new variable weights specifying weights for each pixel- weights_info
dict
a dictionary storing information about the weights regridding process, with the fields:
target
: showing which of the two grids was retainedds_grid
: a dictionary with the grid{"lat":ds.lat,"lon",ds.lon}
weights_grid
: a dictionary with the grid{"lat":weights.lat,"lon":weights.lon}
- ds
- xagg.core.read_wm(path)[source]
Load temporary weightmap files from
wm.to_file()
Builds a weightmap out of saved weightmap component files. Particularly useful if the weightmap took a particularly long time to calculated (i.e., if the grid is particularly high resolution).
Assumes the files are generated from
wm.to_file()
; i.e., the files are all in a directory name:- name/name.shp
the geometry of the input polygons
- name/name.agg
the dataframe with the pixel overlap data
- name/name_lat.nc, name/name_lon.nc
the source grid of the raster data
- name/name_weights.nc :
the additional weights grid, if used (this file is optional; if no file with this name is found, no weights are assumed, and wm.weights=’noweights’)
- Parameters:
- path
str
The directory in which the files are stored. They are assumed to follow the filename convention of sharing the name of the directory (i.e., the last part of this path.)
- path
- Returns:
- wm
xagg.weightmap
- wm
xagg.diag module
- xagg.diag.diag_fig(wm, poly_id, pix_overlap_info, cmap='magma_r', max_title_depth=5, fig=None, ax=None)[source]
Create a diagnostic figure showing overlap between pixels and a given polygon
- Parameters:
- wm
xagg.classes.weightmap
the output to
xagg.core.get_pixel_overlaps()
- poly_id
int
,list
, ordict
which polygon to plot. If int, then just the polygon in that indexed row of wm.agg is plotted. If list of int`s, then those polygons are plotted. If dict, then all matches in the geodataframe are plotted (e.g., `poly_id = {‘adm1_code’:’URY-8’}).
- pix_overlap_info
xarray.dataset.Dataset
,xarray.dataarray.DataArray
, or the output toxa.core.create_raster_polygons()
If ds or da, the original dataset used to calculate the wm; needs to be re-entered here because wm does not keep the original pixel polygons. Otherwise, put in the output to
xa.core.create_raster_polygons(ds)()
- cmapstr, by default ‘magma_r’
colormap, must be the name of a matplotlib-recognized colormap
- max_title_depthint, by default 5
if only showing one polygon, then the plot title is ‘, ‘.join(), called on the first max_title_depth columns of wm.agg that aren’t added by xagg
- figmpl.figure.Figure or None, by default None
if not None, then this figure handle is used
- axmpl.axis.Axis or None, by default None
if not None, then this axis handle is used
- wm
- Returns:
- fig,axthe matplotlib figure, axis handles
xagg.export module
- xagg.export.export_weightmap(wm_obj, fn, overwrite=False)[source]
Save a copy of the weightmap, to avoid recalculating it
- xagg.export.output_data(agg_obj, output_format, output_fn, loc_dim='poly_idx', silent=False)[source]
Wrapper for prep_for_* functions
- Parameters:
- agg_obj
xagg.classes.aggregated
object to be exported
- output_format
str
‘netcdf’, ‘csv’, or ‘shp’
- output_fn: :py:class:`str`
the output filename
- loc_dim
str
, optional. default ='poly_idx'
the name of the dimension with location indices; used by
xagg.export.prep_for_nc()
for nc, csv output
- agg_obj
- Returns:
- the variable that gets saved, so depending on the output_format:
“netcdf”: the
xarray.Dataset
on which.to_netcdf
was called“csv”: the
pandas.Dataframe
on which.to_csv
was called (uses the xarray ds.to_dataframe() functionality)“shp”: the
geopandas.GeoDataDrame
on which.to_file
was called
- xagg.export.prep_for_csv(agg_obj, add_geom=False)[source]
Preps aggregated data for output as a csv
Concretely, aggregated data is placed in a new pandas dataframe and expanded wide - each aggregated variable is placed in new columns; one column per coordinate in each dimension that isn’t the location (poolygon). So, for example, a lat x lon x time variable “tas”, aggregated to location x time, would be reshaped long to columns “tas0”, “tas1”, “tas2”,… for timestep 0, 1, etc.
Note: Currently no support for variables with more than one extra dimension beyond their location dimensions. Potential options: a multi-index column name, so [var]0-0, [var]0-1, etc…
- Parameters:
- agg_obj
xagg.classes.aggregated
the output from
aggregate()
- add_geombool, default False
if True, adds the geometry from the original shapefile/ geodataframe back into the prepped output; this is for the .to_geodataframe() conversion option.
- agg_obj
- Returns:
- df
pandas.DataFrame
a pandas dataframe containing all the fields from the original location polygons + columns containing the values of the aggregated variables at each location. This can then easily be exported as a csv directly (using
df.to_csv
) or to shapefiles by first turning into a geodataframe.
- df
- xagg.export.prep_for_nc(agg_obj, loc_dim='poly_idx')[source]
Preps aggregated data for output as a netcdf
Concretely, aggregated data is placed in a new xarray dataset with dimensions of location (the different polygons in gdf_out) and any other dimension(s) in the original input raster data. All fields from the input polygons are kept as variables with dimension of location.
- Parameters:
- agg_obj
xagg.classes.aggregated
- loc_dim
str
, optional the name of the location dimension; by definition ‘poly_idx’. Values of that dimension are currently only an integer index (with further information given by the field variables). Future versions may allow, if loc_dim is set to the name of a field in the input polygons, to replace the dimension with the values of that field.
- agg_obj
xagg.wrappers module
- xagg.wrappers.pixel_overlaps(ds, gdf_in, weights=None, weights_target='ds', subset_bbox=True, impl='for_loop', silent=False)[source]
Wrapper function for determining overlaps between grid and polygon
For a geodataframe gdf_in, takes an xarray structure ds (Dataset or DataArray) and for each polygon in gdf_in provides a list of pixels given by the ds grid which overlap that polygon, in addition to their relative area of overlap with the polygon.
The output is then ready to be fed into
xagg.core.aggregate()
, which aggregates the variables in ds to the polygons in gdf_in using area- (and optionally other) weights.(NB: the wrapper uses
subset_bbox = True
inxagg.core.create_raster_polygons()
)- Parameters:
- ds
xarray.Dataset
,xarray.DataArray
an xarray Dataset or DataArray containing at least grid variables (“lat”/”lon”, though several other names are supported; see docs for
xagg.aux.fix_ds()
) and at least one variable on that grid- gdf_in
geopandas.GeoDataFrame
a geopandas GeoDataFrame containing polygons (and any other fields, for example fields from shapefiles)
- weights
xarray.DataArray
orNone
, optional, default =None
(by default, None) if additional weights are desired, (for example, weighting pixels by population in addition to by area overlap), weights is an
xarray.DataArray
containing that information. It does not have to be on the same grid as ds - grids will be homogonized (see below).- weights_target
str
, optional if ‘ds’, then weights are regridded to the grid in [ds]; if ‘weights’, then the ds variables are regridded to the grid in ‘weights’ (LATTER NOT SUPPORTED YET, raises a NotImplementedError)
- impl
str
, optional, default ='for_loop'
whether to use the
'for_loop'
or'dot_product'
methods for aggregating; the former uses less memory, the latter may be faster in certain circumstances- silent
bool
, default = False if True, then no status updates are printed to std out
- ds
- Returns:
- wm_out
dict
the output of
xagg.core.get_pixel_overlaps()
which gives the mapping of pixels to polygon aggregation; to be input intoxagg.core.aggregate()
.
- wm_out