OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Split numpy array into intervals of arbitrary lengths where difference between start and end of each interval is T

  • Thread starter Thread starter Sparsh Garg
  • Start date Start date
S

Sparsh Garg

Guest
So the question is given a numpy array consisting of N float values where min is X and max is Y we would like to split it into chunks such that each interval's start and end is seperated by T .

Code:
for example if A contains 300 values (float) where max is 221.4 and minimum is 217.4,the it should be split into intervals such that
interval 1 contains values between 217.4 and 218.2
interval 2 contains values between 218.2 and 219.0
interval 3 contains values between 219.0 and 219.8
interval 4 contains values between 219.8 and 220.6
and interval 5 contains values between 220.6 and 221.4

I decided to look at itertools.batched and also found this solution via chatgpt

Code:
import numpy as np

# Example numpy array
arr = np.random.uniform(217.4, 221.4, size=300)  # Generate random data between 217.4 and 221.4

# Define the intervals
intervals = [
    (217.4, 218.2),
    (218.2, 219.0),
    (219.0, 219.8),
    (219.8, 220.6),
    (220.6, 221.4)
]

# Initialize lists to store chunks
chunks = [[] for _ in range(len(intervals))]

# Iterate through the array and distribute values into chunks
for value in arr:
    for i, (start, end) in enumerate(intervals):
        if start <= value < end:
            chunks[i].append(value)
            break  # Exit the inner loop once value is added to a chunk

# Convert lists to numpy arrays
chunks = [np.array(chunk) for chunk in chunks]

# Print the resulting chunks
for i, chunk in enumerate(chunks):
    print(f"Interval {i+1} ({intervals[i]}): {len(chunk)} values")

The problem with this solution is that if we execute the code multiple times,the number of values in each interval keeps changing. For example execution 1 returns 69 values for first interval,executing the code again gives us 56 values and executing it the third times gives me 69 values again. As such I would like to split the array into chunks wherein the no of values in each chunk doesn't change irrespective of how many times the code is run.

For sake of brevity given a numpy array i would like to split into intervals of arbitary length.Although the intervals can be of arbitary length ,the difference between start and end of each interval is always T.Second constraint is that once the split is done,the length of the chunks must not change irrespective of how many times the code is run.

First looked at ways to split numpy array such as batched,split_array and groupby(https://realpython.com/how-to-split-a-python-list-into-chunks/)

Then went on chatgpt to ask this question(seems gpt solution wasn't able to handle the second constraint that the length of the chunks must be fixed irrespective of how many times the code is run. there is another question on SF which gives us intervals for specific category(Find Repeating Intervals of Arbitrary Length of a Series of Categorical Data in Pandas),but this only tells me the intervals start and end.I want to know the no of values in each interval that starts and ends. the no of values in each interval will tell me that given 300 values x % of the values in this array fall in the interval (217.4,218.2).
<p>So the question is given a numpy array consisting of N float values where min is X and max is Y we would like to split it into chunks such that each interval's start and end is seperated by T .</p>
<pre><code>for example if A contains 300 values (float) where max is 221.4 and minimum is 217.4,the it should be split into intervals such that
interval 1 contains values between 217.4 and 218.2
interval 2 contains values between 218.2 and 219.0
interval 3 contains values between 219.0 and 219.8
interval 4 contains values between 219.8 and 220.6
and interval 5 contains values between 220.6 and 221.4
</code></pre>
<p>I decided to look at itertools.batched and also found this solution via chatgpt</p>
<pre><code>import numpy as np

# Example numpy array
arr = np.random.uniform(217.4, 221.4, size=300) # Generate random data between 217.4 and 221.4

# Define the intervals
intervals = [
(217.4, 218.2),
(218.2, 219.0),
(219.0, 219.8),
(219.8, 220.6),
(220.6, 221.4)
]

# Initialize lists to store chunks
chunks = [[] for _ in range(len(intervals))]

# Iterate through the array and distribute values into chunks
for value in arr:
for i, (start, end) in enumerate(intervals):
if start <= value < end:
chunks.append(value)
break # Exit the inner loop once value is added to a chunk

# Convert lists to numpy arrays
chunks = [np.array(chunk) for chunk in chunks]

# Print the resulting chunks
for i, chunk in enumerate(chunks):
print(f"Interval {i+1} ({intervals}): {len(chunk)} values")
</code></pre>
<p>The problem with this solution is that if we execute the code multiple times,the number of values in each interval keeps changing.
For example execution 1 returns 69 values for first interval,executing the code again gives us 56 values and executing it the third times gives me 69 values again.
As such I would like to split the array into chunks wherein the no of values in each chunk doesn't change irrespective of how many times the code is run.</p>
<p>For sake of brevity
given a numpy array i would like to split into intervals of arbitary length.Although the intervals can be of arbitary length ,the difference between start and end of each interval is always T.Second constraint is that once the split is done,the length of the chunks must not change irrespective of how many times the code is run.</p>
<p>First looked at ways to split numpy array such as batched,split_array and groupby(<a href="https://realpython.com/how-to-split-a-python-list-into-chunks/" rel="nofollow noreferrer">https://realpython.com/how-to-split-a-python-list-into-chunks/</a>)</p>
<p>Then went on chatgpt to ask this question(seems gpt solution wasn't able to handle the second constraint that the length of the chunks must be fixed irrespective of how many times the code is run.
there is another question on SF which gives us intervals for specific category(<a href="https://stackoverflow.com/questions...ary-length-of-a-series-of-categorical-data-in">Find Repeating Intervals of Arbitrary Length of a Series of Categorical Data in Pandas</a>),but this only tells me the intervals start and end.I want to know the no of values in each interval that starts and ends.
the no of values in each interval will tell me that given 300 values x % of the values in this array fall in the interval (217.4,218.2).</p>
 

Latest posts

Top