# Template xarray based on Earth Engine
import numpy as np
import pandas as pd
import dask
import xarray as xr
# Define the dimensions
time = pd.date_range("2020-12-29T18:57:32.281000", periods=3)
X = np.linspace(-421600, 486700, 9084)
Y = np.linspace(-599200, 458500, 10578)
# Create a data array with random data for each variable
data = np.random.rand(len(time), len(X), len(Y)).astype(np.float32)
# Create a dictionary of data variables
data_vars = {
'SR_B4': (['time', 'X', 'Y'], data),
'SR_B5': (['time', 'X', 'Y'], data)
}
chunk_size = {'time': 3, 'X': 1, 'Y': 1}
# Create the dataset
ds = xr.Dataset(
data_vars=data_vars,
coords={'time': time, 'X': X, 'Y': Y},
attrs={
'date_range': '[1365638400000, 1654560000000]',
'description': '<p>This dataset contains atmospherically corrected data.</p>',
'keywords': ['cfmask', 'cloud', 'fmask', 'global', 'l8sr', 'landsat'],
'period': 0,
'visualization_2_max': 30000.0,
'visualization_2_min': 0.0,
'visualization_2_name': 'Shortwave Infrared (753)',
'crs': 'EPSG:3310'
}
).chunk(chunk_size)
I want to be able to grab only a single chunk’s worth of data and run it through Dask. However, I don’t want Dask to lazily queue up tasks for each chunk as I have selected a very small chunk size. I simply want to run a single chunk of a specific size without doing a full computation on the entire dataset. In other words, I’d like Dask to create a task graph consisting of just one chunk. I could create a dataset that matches the size of my chunk, but I’d like to see if there is another option. I acknowledge this is against Dask’s documentation on determining optimal chunk sizes.
You need to sign in to view this answers