OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

`xarray`: setting `drop=True` when filtering a `Dataset` causes `IndexError: dimension coordinate conflicts between indexed and indexing objects`

  • Thread starter Thread starter bzm3r
  • Start date Start date
B

bzm3r

Guest
Some preliminary setup:

Code:
import xarray as xr
import numpy as np

xr.set_options(display_style="text")

Code:
<xarray.core.options.set_options at 0x7f3777111e50>

Suppose that I have labels which are composed of two parts: first and second:

Code:
raw_labels = np.array(
    [["a", "c"], ["b", "a"], ["a", "b"], ["c", "a"]],
    dtype="<U1",
)
raw_labels

Code:
array([['a', 'c'],
       ['b', 'a'],
       ['a', 'b'],
       ['c', 'a']], dtype='<U1')

I can make an xarray.DataArray easily enough to represent this raw information with informative tags:

Code:
label_metas = xr.DataArray(
    raw_labels,
    dims=("label", "parts"),
    coords={
        "label": ["-".join(x) for x in raw_labels],
        "parts": ["first", "second"],
    },
    name="meta",
)
label_metas

<xarray.DataArray 'meta' (label: 4, parts: 2)> Size: 32B
array([['a', 'c'],
['b', 'a'],
['a', 'b'],
['c', 'a']], dtype='<U1')
Coordinates:
* label (label) <U3 48B 'a-c' 'b-a' 'a-b' 'c-a'
* parts (parts) <U6 48B 'first' 'second'

Now suppose that I have additional information for a label: let's say it is some count information for simplicity.

Code:
raw_counts = np.random.randint(0, 100, size=len(label_metas))
raw_counts

Code:
array([95, 23,  6, 77])

Code:
label_counts = xr.DataArray(
    raw_counts,
    dims="label",
    coords={"label": label_metas.coords["label"]},
    name="count",
)
label_counts

<xarray.DataArray 'count' (label: 4)> Size: 32B
array([95, 23, 6, 77])
Coordinates:
* label (label) <U3 48B 'a-c' 'b-a' 'a-b' 'c-a'

How do I combine these clearly related xr.DataArrays? From what I understand: by using xr.Datasets.

Code:
label_info = xr.merge([label_metas, label_counts])
label_info

<xarray.Dataset> Size: 160B
Dimensions: (label: 4, parts: 2)
Coordinates:
* label (label) <U3 48B 'a-c' 'b-a' 'a-b' 'c-a'
* parts (parts) <U6 48B 'first' 'second'
Data variables:
meta (label, parts) <U1 32B 'a' 'c' 'b' 'a' 'a' 'b' 'c' 'a'
count (label) int64 32B 95 23 6 77

Now suppose I want to filter this dataset, so that I only have left those labels with first part 'a'. How would I go about it? According to the docs, where can apply to xr.Dataset too, but no examples are given showing this in action. Here are the results of my experiments:

Code:
label_info["meta"].sel(parts="first")

<xarray.DataArray 'meta' (label: 4)> Size: 16B
array(['a', 'b', 'a', 'c'], dtype='<U1')
Coordinates:
* label (label) <U3 48B 'a-c' 'b-a' 'a-b' 'c-a'
parts <U6 24B 'first'

Code:
label_info.where(label_info["meta"].sel(parts="first") == "a")

<xarray.Dataset> Size: 192B
Dimensions: (label: 4, parts: 2)
Coordinates:
* label (label) <U3 48B 'a-c' 'b-a' 'a-b' 'c-a'
* parts (parts) <U6 48B 'first' 'second'
Data variables:
meta (label, parts) object 64B 'a' 'c' nan nan 'a' 'b' nan nan
count (label) float64 32B 95.0 nan 6.0 nan

We see that those points that do not match the where are replaced with a np.nan, as expected from the docs. Does that mean there is some re-allocation of backing arrays involved? Suppose then that we just asked for those regions that do not match to be dropped, does that also cause a re-allocation? I am not sure, because I am unable to drop those values due to IndexError: dimension coordinate 'parts' conflicts between indexed and indexing objects:

Code:
label_info.where(label_info["meta"].sel(parts="first") == "a", drop=True)

Code:
---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

Cell In[20], line 1
----> 1 label_info.where(label_info["meta"].sel(parts="first") == "a", drop=True)


File ~/miniforge3/envs/xarray-tutorial/lib/python3.11/site-packages/xarray/core/common.py:1225, in DataWithCoords.where(self, cond, other, drop)
   1222     for dim in cond.sizes.keys():
   1223         indexers[dim] = _get_indexer(dim)
-> 1225     self = self.isel(**indexers)
   1226     cond = cond.isel(**indexers)
   1228 return ops.where_method(self, cond, other)


File ~/miniforge3/envs/xarray-tutorial/lib/python3.11/site-packages/xarray/core/dataset.py:2972, in Dataset.isel(self, indexers, drop, missing_dims, **indexers_kwargs)
   2970 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel")
   2971 if any(is_fancy_indexer(idx) for idx in indexers.values()):
-> 2972     return self._isel_fancy(indexers, drop=drop, missing_dims=missing_dims)
   2974 # Much faster algorithm for when all indexers are ints, slices, one-dimensional
   2975 # lists, or zero or one-dimensional np.ndarray's
   2976 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims)


File ~/miniforge3/envs/xarray-tutorial/lib/python3.11/site-packages/xarray/core/dataset.py:3043, in Dataset._isel_fancy(self, indexers, drop, missing_dims)
   3040 selected = self._replace_with_new_dims(variables, coord_names, indexes)
   3042 # Extract coordinates from indexers
-> 3043 coord_vars, new_indexes = selected._get_indexers_coords_and_indexes(indexers)
   3044 variables.update(coord_vars)
   3045 indexes.update(new_indexes)


File ~/miniforge3/envs/xarray-tutorial/lib/python3.11/site-packages/xarray/core/dataset.py:2844, in Dataset._get_indexers_coords_and_indexes(self, indexers)
   2840 # we don't need to call align() explicitly or check indexes for
   2841 # alignment, because merge_variables already checks for exact alignment
   2842 # between dimension coordinates
   2843 coords, indexes = merge_coordinates_without_align(coords_list)
-> 2844 assert_coordinate_consistent(self, coords)
   2846 # silently drop the conflicted variables.
   2847 attached_coords = {k: v for k, v in coords.items() if k not in self._variables}


File ~/miniforge3/envs/xarray-tutorial/lib/python3.11/site-packages/xarray/core/coordinates.py:941, in assert_coordinate_consistent(obj, coords)
    938 for k in obj.dims:
    939     # make sure there are no conflict in dimension coordinates
    940     if k in coords and k in obj.coords and not coords[k].equals(obj[k].variable):
--> 941         raise IndexError(
    942             f"dimension coordinate {k!r} conflicts between "
    943             f"indexed and indexing objects:\n{obj[k]}\nvs.\n{coords[k]}"
    944         )


IndexError: dimension coordinate 'parts' conflicts between indexed and indexing objects:
<xarray.DataArray 'parts' (parts: 2)> Size: 48B
array(['first', 'second'], dtype='<U6')
Coordinates:
  * parts    (parts) <U6 48B 'first' 'second'
vs.
<xarray.Variable ()> Size: 24B
array('first', dtype='<U6')
<p>Some preliminary setup:</p>
<pre class="lang-py prettyprint-override"><code>import xarray as xr
import numpy as np

xr.set_options(display_style="text")
</code></pre>
<pre><code><xarray.core.options.set_options at 0x7f3777111e50>
</code></pre>
<p>Suppose that I have <code>label</code>s which are composed of two parts: <code>first</code> and <code>second</code>:</p>
<pre class="lang-py prettyprint-override"><code>raw_labels = np.array(
[["a", "c"], ["b", "a"], ["a", "b"], ["c", "a"]],
dtype="<U1",
)
raw_labels
</code></pre>
<pre><code>array([['a', 'c'],
['b', 'a'],
['a', 'b'],
['c', 'a']], dtype='<U1')
</code></pre>
<p>I can make an <code>xarray.DataArray</code> easily enough to represent this raw information with informative tags:</p>
<pre class="lang-py prettyprint-override"><code>label_metas = xr.DataArray(
raw_labels,
dims=("label", "parts"),
coords={
"label": ["-".join(x) for x in raw_labels],
"parts": ["first", "second"],
},
name="meta",
)
label_metas
</code></pre>
<pre><xarray.DataArray &#x27;meta&#x27; (label: 4, parts: 2)> Size: 32B
array([[&#x27;a&#x27;, &#x27;c&#x27;],
[&#x27;b&#x27;, &#x27;a&#x27;],
[&#x27;a&#x27;, &#x27;b&#x27;],
[&#x27;c&#x27;, &#x27;a&#x27;]], dtype=&#x27;<U1&#x27;)
Coordinates:
* label (label) <U3 48B &#x27;a-c&#x27; &#x27;b-a&#x27; &#x27;a-b&#x27; &#x27;c-a&#x27;
* parts (parts) <U6 48B &#x27;first&#x27; &#x27;second&#x27;</pre>
<p>Now suppose that I have additional information for a label: let's say it is some count information for simplicity.</p>
<pre class="lang-py prettyprint-override"><code>raw_counts = np.random.randint(0, 100, size=len(label_metas))
raw_counts
</code></pre>
<pre><code>array([95, 23, 6, 77])
</code></pre>
<pre class="lang-py prettyprint-override"><code>label_counts = xr.DataArray(
raw_counts,
dims="label",
coords={"label": label_metas.coords["label"]},
name="count",
)
label_counts
</code></pre>
<pre><xarray.DataArray &#x27;count&#x27; (label: 4)> Size: 32B
array([95, 23, 6, 77])
Coordinates:
* label (label) <U3 48B &#x27;a-c&#x27; &#x27;b-a&#x27; &#x27;a-b&#x27; &#x27;c-a&#x27;</pre>
<p>How do I combine these clearly related <code>xr.DataArray</code>s? From what I understand: by using <code>xr.Dataset</code>s.</p>
<pre class="lang-py prettyprint-override"><code>label_info = xr.merge([label_metas, label_counts])
label_info
</code></pre>
<pre><xarray.Dataset> Size: 160B
Dimensions: (label: 4, parts: 2)
Coordinates:
* label (label) <U3 48B &#x27;a-c&#x27; &#x27;b-a&#x27; &#x27;a-b&#x27; &#x27;c-a&#x27;
* parts (parts) <U6 48B &#x27;first&#x27; &#x27;second&#x27;
Data variables:
meta (label, parts) <U1 32B &#x27;a&#x27; &#x27;c&#x27; &#x27;b&#x27; &#x27;a&#x27; &#x27;a&#x27; &#x27;b&#x27; &#x27;c&#x27; &#x27;a&#x27;
count (label) int64 32B 95 23 6 77</pre>
<p>Now suppose I want to filter this dataset, so that I only have left those labels with first part <code>'a'</code>. How would I go about it? According to the docs, <a href="https://docs.xarray.dev/en/stable/generated/xarray.Dataset.where.html" rel="nofollow noreferrer"><code>where</code> can apply to <code>xr.Dataset</code> too</a>, but no examples are given showing this in action. Here are the results of my experiments:</p>
<pre class="lang-py prettyprint-override"><code>label_info["meta"].sel(parts="first")
</code></pre>
<pre><xarray.DataArray &#x27;meta&#x27; (label: 4)> Size: 16B
array([&#x27;a&#x27;, &#x27;b&#x27;, &#x27;a&#x27;, &#x27;c&#x27;], dtype=&#x27;<U1&#x27;)
Coordinates:
* label (label) <U3 48B &#x27;a-c&#x27; &#x27;b-a&#x27; &#x27;a-b&#x27; &#x27;c-a&#x27;
parts <U6 24B &#x27;first&#x27;</pre>
<pre class="lang-py prettyprint-override"><code>label_info.where(label_info["meta"].sel(parts="first") == "a")
</code></pre>
<pre><xarray.Dataset> Size: 192B
Dimensions: (label: 4, parts: 2)
Coordinates:
* label (label) <U3 48B &#x27;a-c&#x27; &#x27;b-a&#x27; &#x27;a-b&#x27; &#x27;c-a&#x27;
* parts (parts) <U6 48B &#x27;first&#x27; &#x27;second&#x27;
Data variables:
meta (label, parts) object 64B &#x27;a&#x27; &#x27;c&#x27; nan nan &#x27;a&#x27; &#x27;b&#x27; nan nan
count (label) float64 32B 95.0 nan 6.0 nan</pre>
<p>We see that those points that do not match the <code>where</code> are replaced with a <code>np.nan</code>, as expected from the docs. Does that mean there is some re-allocation of backing arrays involved? Suppose then that we just asked for those regions that do not match to be dropped, does that also cause a re-allocation? I am not sure, because I am unable to drop those values due to <code>IndexError: dimension coordinate 'parts' conflicts between indexed and indexing objects</code>:</p>
<pre class="lang-py prettyprint-override"><code>label_info.where(label_info["meta"].sel(parts="first") == "a", drop=True)
</code></pre>
<pre><code>---------------------------------------------------------------------------

IndexError Traceback (most recent call last)

Cell In[20], line 1
----> 1 label_info.where(label_info["meta"].sel(parts="first") == "a", drop=True)


File ~/miniforge3/envs/xarray-tutorial/lib/python3.11/site-packages/xarray/core/common.py:1225, in DataWithCoords.where(self, cond, other, drop)
1222 for dim in cond.sizes.keys():
1223 indexers[dim] = _get_indexer(dim)
-> 1225 self = self.isel(**indexers)
1226 cond = cond.isel(**indexers)
1228 return ops.where_method(self, cond, other)


File ~/miniforge3/envs/xarray-tutorial/lib/python3.11/site-packages/xarray/core/dataset.py:2972, in Dataset.isel(self, indexers, drop, missing_dims, **indexers_kwargs)
2970 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel")
2971 if any(is_fancy_indexer(idx) for idx in indexers.values()):
-> 2972 return self._isel_fancy(indexers, drop=drop, missing_dims=missing_dims)
2974 # Much faster algorithm for when all indexers are ints, slices, one-dimensional
2975 # lists, or zero or one-dimensional np.ndarray's
2976 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims)


File ~/miniforge3/envs/xarray-tutorial/lib/python3.11/site-packages/xarray/core/dataset.py:3043, in Dataset._isel_fancy(self, indexers, drop, missing_dims)
3040 selected = self._replace_with_new_dims(variables, coord_names, indexes)
3042 # Extract coordinates from indexers
-> 3043 coord_vars, new_indexes = selected._get_indexers_coords_and_indexes(indexers)
3044 variables.update(coord_vars)
3045 indexes.update(new_indexes)


File ~/miniforge3/envs/xarray-tutorial/lib/python3.11/site-packages/xarray/core/dataset.py:2844, in Dataset._get_indexers_coords_and_indexes(self, indexers)
2840 # we don't need to call align() explicitly or check indexes for
2841 # alignment, because merge_variables already checks for exact alignment
2842 # between dimension coordinates
2843 coords, indexes = merge_coordinates_without_align(coords_list)
-> 2844 assert_coordinate_consistent(self, coords)
2846 # silently drop the conflicted variables.
2847 attached_coords = {k: v for k, v in coords.items() if k not in self._variables}


File ~/miniforge3/envs/xarray-tutorial/lib/python3.11/site-packages/xarray/core/coordinates.py:941, in assert_coordinate_consistent(obj, coords)
938 for k in obj.dims:
939 # make sure there are no conflict in dimension coordinates
940 if k in coords and k in obj.coords and not coords[k].equals(obj[k].variable):
--> 941 raise IndexError(
942 f"dimension coordinate {k!r} conflicts between "
943 f"indexed and indexing objects:\n{obj[k]}\nvs.\n{coords[k]}"
944 )


IndexError: dimension coordinate 'parts' conflicts between indexed and indexing objects:
<xarray.DataArray 'parts' (parts: 2)> Size: 48B
array(['first', 'second'], dtype='<U6')
Coordinates:
* parts (parts) <U6 48B 'first' 'second'
vs.
<xarray.Variable ()> Size: 24B
array('first', dtype='<U6')
</code></pre>
 

Latest posts

D
Replies
0
Views
1
Damiano Miazzi
D
Top