Run only a single chunk's worth of data without creating a Dask graph of all my chunks?

# Template xarray based on Earth Engine
import numpy as np
import pandas as pd
import dask
import xarray as xr

# Define the dimensions
time = pd.date_range("2020-12-29T18:57:32.281000", periods=3)
X = np.linspace(-421600, 486700, 9084)
Y = np.linspace(-599200, 458500, 10578)

# Create a data array with random data for each variable
data = np.random.rand(len(time), len(X), len(Y)).astype(np.float32)

# Create a dictionary of data variables
data_vars = {
    'SR_B4': (['time', 'X', 'Y'], data),
    'SR_B5': (['time', 'X', 'Y'], data)
}

chunk_size = {'time': 3, 'X': 1, 'Y': 1}

# Create the dataset
ds = xr.Dataset(
    data_vars=data_vars,
    coords={'time': time, 'X': X, 'Y': Y},
    attrs={
        'date_range': '[1365638400000, 1654560000000]',
        'description': '<p>This dataset contains atmospherically corrected data.</p>',
        'keywords': ['cfmask', 'cloud', 'fmask', 'global', 'l8sr', 'landsat'],
        'period': 0,
        'visualization_2_max': 30000.0,
        'visualization_2_min': 0.0,
        'visualization_2_name': 'Shortwave Infrared (753)',
        'crs': 'EPSG:3310'
    }
).chunk(chunk_size)

I want to be able to grab only a single chunk’s worth of data and run it through Dask. However, I don’t want Dask to lazily queue up tasks for each chunk as I have selected a very small chunk size. I simply want to run a single chunk of a specific size without doing a full computation on the entire dataset. In other words, I’d like Dask to create a task graph consisting of just one chunk. I could create a dataset that matches the size of my chunk, but I’d like to see if there is another option. I acknowledge this is against Dask’s documentation on determining optimal chunk sizes.

You need to sign in to view this answers

Related Post