OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Meta-feature analysis: split data for computation on available memory

  • Thread starter Thread starter arilwan
  • Start date Start date
A

arilwan

Guest
I am working with the meta-feature extractor package: pymfe for complexity analysis. On a small dataset, this is not a problem, for example.

Code:
pip install -U pymfe

from sklearn.datasets import make_classification
from sklearn.datasets import load_iris
from pymfe.mfe import MFE

data = load_iris()
X= data.data
y = data.target

extractor = MFE(features=[ "t1"], groups=["complexity"],
                  summary=["min", "max", "mean", "sd"])
extractor.fit(X,y)
extractor.extract()
(['t1'], [0.12])

My dataset is large (32690, 80) and this computation gets killed for exessive memory usage. I work on Ubuntu 24.04 having 32GB RAM.

To reproduce scenario:

Code:
# Generate the dataset
X, y = make_classification(n_samples=20_000,n_features=80,
    n_informative=60, n_classes=5, random_state=42)

extractor = MFE(features=[ "t1"], groups=["complexity"],
                  summary=["min", "max", "mean", "sd"])
extractor.fit(X,y)
extractor.extract()
Killed

Question:

How do I split this task to compute on small partitions of the dataset, and combine final results (averaging)?
<p>I am working with the meta-feature extractor package: <a href="https://github.com/ealcobaca/pymfe" rel="nofollow noreferrer">pymfe</a> for complexity analysis.
On a small dataset, this is not a problem, for example.</p>
<pre><code>pip install -U pymfe

from sklearn.datasets import make_classification
from sklearn.datasets import load_iris
from pymfe.mfe import MFE

data = load_iris()
X= data.data
y = data.target

extractor = MFE(features=[ "t1"], groups=["complexity"],
summary=["min", "max", "mean", "sd"])
extractor.fit(X,y)
extractor.extract()
(['t1'], [0.12])
</code></pre>
<p>My dataset is large <code>(32690, 80)</code> and this computation gets killed for exessive memory usage. I work on <code>Ubuntu 24.04</code> having <code>32GB</code> RAM.</p>
<p>To reproduce scenario:</p>
<pre class="lang-py prettyprint-override"><code># Generate the dataset
X, y = make_classification(n_samples=20_000,n_features=80,
n_informative=60, n_classes=5, random_state=42)

extractor = MFE(features=[ "t1"], groups=["complexity"],
summary=["min", "max", "mean", "sd"])
extractor.fit(X,y)
extractor.extract()
Killed
</code></pre>
<p><strong>Question:</strong></p>
<p>How do I split this task to compute on small partitions of the dataset, and combine final results (averaging)?</p>
 
Top