OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

`xarray`: merging two `DataArray`s which have only one shared dimension results in a `Dataset` that lists other dimensions?

  • Thread starter Thread starter bzm3r
  • Start date Start date
B

bzm3r

Guest
Some preliminary setup:

Code:
import xarray as xr
import numpy as np

xr.set_options(display_style="text")

Code:
<xarray.core.options.set_options at 0x7f28242cf7d0>

Suppose that I have labels which are composed of two parts: first and second:

Code:
raw_labels = np.array(
    [["a", "c"], ["b", "a"], ["a", "b"], ["c", "a"]],
    dtype="<U1",
)
raw_labels

Code:
array([['a', 'c'],
       ['b', 'a'],
       ['a', 'b'],
       ['c', 'a']], dtype='<U1')

I can make an xarray.DataArray easily enough to represent this raw information with informative tags:

Code:
label_metas = xr.DataArray(
    raw_labels,
    dims=("label", "parts"),
    coords={
        "label": ["-".join(x) for x in raw_labels],
        "parts": ["first", "second"],
    },
    name="meta",
)
label_metas

<xarray.DataArray 'meta' (label: 4, parts: 2)> Size: 32B
array([['a', 'c'],
['b', 'a'],
['a', 'b'],
['c', 'a']], dtype='<U1')
Coordinates:
* label (label) <U3 48B 'a-c' 'b-a' 'a-b' 'c-a'
* parts (parts) <U6 48B 'first' 'second'

Now suppose that I have additional information for a label: let's say it is some count information for simplicity.

Code:
raw_counts = np.random.randint(0, 100, size=len(label_metas))
raw_counts

Code:
array([17, 10, 97, 72])

Code:
label_counts = xr.DataArray(
    raw_counts,
    dims="label",
    coords={"label": label_metas.coords["label"]},
    name="count",
)
label_counts

<xarray.DataArray 'count' (label: 4)> Size: 32B
array([17, 10, 97, 72])
Coordinates:
* label (label) <U3 48B 'a-c' 'b-a' 'a-b' 'c-a'

How do I combine these clearly related xr.DataArrays? From what I understand: by using xr.Datasets.

Code:
label_info = xr.merge([label_metas, label_counts])
label_info

<xarray.Dataset> Size: 160B
Dimensions: (label: 4, parts: 2)
Coordinates:
* label (label) <U3 48B 'a-c' 'b-a' 'a-b' 'c-a'
* parts (parts) <U6 48B 'first' 'second'
Data variables:
meta (label, parts) <U1 32B 'a' 'c' 'b' 'a' 'a' 'b' 'c' 'a'
count (label) int64 32B 17 10 97 72

It is weird that as far as dimensions go, we have "label" and "parts", because the only shared dimension that exists between the xr.DataArrays in this xr.Dataset should be "label"?

How should one understand that the xr.Dataset has two dimensions "label" and "parts"?
<p>Some preliminary setup:</p>
<pre class="lang-py prettyprint-override"><code>import xarray as xr
import numpy as np

xr.set_options(display_style="text")
</code></pre>
<pre><code><xarray.core.options.set_options at 0x7f28242cf7d0>
</code></pre>
<p>Suppose that I have <code>label</code>s which are composed of two parts: <code>first</code> and <code>second</code>:</p>
<pre class="lang-py prettyprint-override"><code>raw_labels = np.array(
[["a", "c"], ["b", "a"], ["a", "b"], ["c", "a"]],
dtype="<U1",
)
raw_labels
</code></pre>
<pre><code>array([['a', 'c'],
['b', 'a'],
['a', 'b'],
['c', 'a']], dtype='<U1')
</code></pre>
<p>I can make an <code>xarray.DataArray</code> easily enough to represent this raw information with informative tags:</p>
<pre class="lang-py prettyprint-override"><code>label_metas = xr.DataArray(
raw_labels,
dims=("label", "parts"),
coords={
"label": ["-".join(x) for x in raw_labels],
"parts": ["first", "second"],
},
name="meta",
)
label_metas
</code></pre>
<pre><xarray.DataArray &#x27;meta&#x27; (label: 4, parts: 2)> Size: 32B
array([[&#x27;a&#x27;, &#x27;c&#x27;],
[&#x27;b&#x27;, &#x27;a&#x27;],
[&#x27;a&#x27;, &#x27;b&#x27;],
[&#x27;c&#x27;, &#x27;a&#x27;]], dtype=&#x27;<U1&#x27;)
Coordinates:
* label (label) <U3 48B &#x27;a-c&#x27; &#x27;b-a&#x27; &#x27;a-b&#x27; &#x27;c-a&#x27;
* parts (parts) <U6 48B &#x27;first&#x27; &#x27;second&#x27;</pre>
<p>Now suppose that I have additional information for a label: let's say it is some count information for simplicity.</p>
<pre class="lang-py prettyprint-override"><code>raw_counts = np.random.randint(0, 100, size=len(label_metas))
raw_counts
</code></pre>
<pre><code>array([17, 10, 97, 72])
</code></pre>
<pre class="lang-py prettyprint-override"><code>label_counts = xr.DataArray(
raw_counts,
dims="label",
coords={"label": label_metas.coords["label"]},
name="count",
)
label_counts
</code></pre>
<pre><xarray.DataArray &#x27;count&#x27; (label: 4)> Size: 32B
array([17, 10, 97, 72])
Coordinates:
* label (label) <U3 48B &#x27;a-c&#x27; &#x27;b-a&#x27; &#x27;a-b&#x27; &#x27;c-a&#x27;</pre>
<p>How do I combine these clearly related <code>xr.DataArray</code>s? From what I understand: by using <code>xr.Dataset</code>s.</p>
<pre class="lang-py prettyprint-override"><code>label_info = xr.merge([label_metas, label_counts])
label_info
</code></pre>
<pre><xarray.Dataset> Size: 160B
Dimensions: (label: 4, parts: 2)
Coordinates:
* label (label) <U3 48B &#x27;a-c&#x27; &#x27;b-a&#x27; &#x27;a-b&#x27; &#x27;c-a&#x27;
* parts (parts) <U6 48B &#x27;first&#x27; &#x27;second&#x27;
Data variables:
meta (label, parts) <U1 32B &#x27;a&#x27; &#x27;c&#x27; &#x27;b&#x27; &#x27;a&#x27; &#x27;a&#x27; &#x27;b&#x27; &#x27;c&#x27; &#x27;a&#x27;
count (label) int64 32B 17 10 97 72</pre>
<p>It is weird that as far as dimensions go, we have <code>"label"</code> and <code>"parts"</code>, because the only shared dimension that exists between the <code>xr.DataArray</code>s in this <code>xr.Dataset</code> should be <code>"label"</code>?</p>
<p>How should one understand that the <code>xr.Dataset</code> has two dimensions <code>"label"</code> and <code>"parts"</code>?</p>
 

Latest posts

Top