skmap.parallel.blocks.RasterBlockReader#

class RasterBlockReader(reference_file=None)[source]#

Bases: object

Thread-parallel reader for large rasters.

If reference_file is not None, builds an R-tree index [1] of the block geometries read from the reference_file on initialization. All rasters read with the initialized reader are assumed to have identical geotransforms and block structures to the reference.

Parameters:

reference_file (str) – Path (URL) of the reference raster.

For full usage examples please refer to the block processing tutorial notebook [2].

References

[1] pygeos STRTree

[2] Raster block processing tutorial

Examples

>>> from skmap.parallel.blocks import RasterBlockReader
>>> from skmap.misc import ttprint
>>>
>>> fp = 'https://s3.eu-central-1.wasabisys.com/skmap/lcv/lcv_landcover.hcl_lucas.corine.rf_p_30m_0..0cm_2019_skmap_epsg3035_v0.1.tif'
>>>
>>> ttprint('initializing reader')
>>> reader = RasterBlockReader(fp)
>>> ttprint('reader initialized')

Methods

read_overlay

Thread-parallel reading of large rasters within a bounding geometry.

read_overlay(src_path, geometry, band=1, geometry_mask=True, max_workers=4, optimize_threadcount=True)[source]#

Thread-parallel reading of large rasters within a bounding geometry.

Only blocks that intersect with geometry are read. Returns a generator yielding (data, mask, window) tuples for each block, where data are the stacked pixel values of all rasters at mask==True, mask is the reduced (via bitwise and) block data mask for all rasters, and window is the rasterio.windows.Window [1] for the block within the transform of the reference_file. All rasters read with the initialized reader are assumed to have identical geotransforms and block structures to the reference_file used for initialization. If the reader was initialized with reference_file==None, the first file in src_path is used as the reference and the block R-tree is built before yielding data from the first block.

Parameters:
  • src_path (Union[str, Iterable[str]]) – Path(s) (or URLs) of the raster file(s) to read.

  • geometry (dict) – The bounding geometry within which to read raster blocks, given as a dictionary (with the GeoJSON geometry schema).

  • band (int) – Index of band to read from all rasters.

  • geometry_mask (bool) – Indicates wheather or not to use the geometry as a data mask. If False, the block data will be returned in its entirety, regardless if some of it falls outside of the geometry.

  • max_workers (int) – Maximum number of worker threads to use, defaults to multiprocessing.cpu_count().

  • optimize_threadcount (bool) – Wheather or not to optimize number of workers. If True, the number of worker threads will be iteratively increased until the average read time per block stops decreasing or max_workers is reached. If False, max_workers will be used as the number of threads.

Returns:

Generator yielding (data, mask, window) tuples for each block.

Return type:

Iterator[Tuple[ndarray, ndarray, Window]]

For full usage examples please refer to the block processing tutorial notebook [2].

References

[1] Rasterio Window

[2] Raster block processing tutorial

Examples

>>> geom = {
...     'type': 'Polygon',
...     'coordinates': [[
...         [4765389, 2441103],
...         [4764441, 2439352],
...         [4767369, 2438696],
...         [4761659, 2441949],
...         [4765389, 2441103],
...     ]],
... }
>>> block_data_gen = reader.read_overlay(fp)
>>> data, mask, window = next(block_data_gen)