OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Randomly adding rows from another datatable based on group and a probability

  • Thread starter Thread starter Richard Dixon
  • Start date Start date
R

Richard Dixon

Guest
Supposed I have a dataframe called "Main" of this format - which will have a large combination of Region and Type of thousands of rows, with the Region/Type combination duplicated, but I'm keeping it simple for this example:

Code:
Region  Type    Value
A        1       600
A        2       700
A        2       750
B        1       700
B        1       500
B        2       900

I also have an another dataframe "Prob" that consists of Region, Type and Probability the probability will always be fixed for a given combination of Region and Type - but it's this table that is used as a lookup for "Main" to set the probability that is discussed further below.

Code:
Region  Type   Probability
A        1       50%
A        2       30%
B        1       50%
B        2       30%

For each row in the top table, I look up the probability from the dataframe "Prob" based on the Region and Type and get Python to work out randomly based on the probability whether I'll add an extra row to the "Main" dataframe. However that extra row of data I'll add will be taken randomly from another dataframe "Extras" which has the same form to "Main".

Code:
Region  Type   Value
A        1       600
A        1       300
A        2       700
A        2       950
B        1       700
B        1        50
B        2       900
B        2       300

But: I can only pull a row from "Extras" that has the same Region and Type as in the "Main" dataframe - and once I've taken that row of data, I remove it from the "Extras" dataframe ready for the next row calculation in "Main" as I don't want to pull the same row twice from the "Extras" dataframe (in reality these dataframes will be much larger).

Am trying to get my head around what I assume is a multi-step process - possibly in a loop? - that can do this fairly simply. In the end I'll have a dataframe for example as something that looks like:

Code:
Region  Type    Value
A        1       600
A        2       700
A        2       750
B        1       700
B        1       500
B        2       900
A        2       950

...where in my six rows of the "Main" dataframe on Region A, Type 2 was "lucky" and we perform the adding row calculation from Extras above and pull the A / 2 / 950 row, and add it to the dataframe (whilst removing it from the Extras dataframe).
<p>Supposed I have a dataframe called "Main" of this format - which will have a large combination of Region and Type of thousands of rows, with the Region/Type combination duplicated, but I'm keeping it simple for this example:</p>
<pre><code>Region Type Value
A 1 600
A 2 700
A 2 750
B 1 700
B 1 500
B 2 900

</code></pre>
<p>I also have an another dataframe "Prob" that consists of Region, Type and Probability the probability will always be fixed for a given combination of Region and Type - but it's this table that is used as a lookup for "Main" to set the probability that is discussed further below.</p>
<pre><code>Region Type Probability
A 1 50%
A 2 30%
B 1 50%
B 2 30%
</code></pre>
<p>For each row in the top table, I look up the probability from the dataframe "Prob" based on the Region and Type and get Python to work out randomly based on the probability whether I'll add an extra row to the "Main" dataframe. However that extra row of data I'll add will be taken randomly from another dataframe "Extras" which has the same form to "Main".</p>
<pre><code>Region Type Value
A 1 600
A 1 300
A 2 700
A 2 950
B 1 700
B 1 50
B 2 900
B 2 300
</code></pre>
<p><em>But</em>: I can only pull a row from "Extras" that has the same Region and Type as in the "Main" dataframe - and once I've taken that row of data, I remove it from the "Extras" dataframe ready for the next row calculation in "Main" as I don't want to pull the same row twice from the "Extras" dataframe (in reality these dataframes will be much larger).</p>
<p>Am trying to get my head around what I assume is a multi-step process - possibly in a loop? - that can do this fairly simply. In the end I'll have a dataframe for example as something that looks like:</p>
<pre><code>Region Type Value
A 1 600
A 2 700
A 2 750
B 1 700
B 1 500
B 2 900
A 2 950
</code></pre>
<p>...where in my six rows of the "Main" dataframe on Region A, Type 2 was "lucky" and we perform the adding row calculation from Extras above and pull the A / 2 / 950 row, and add it to the dataframe (whilst removing it from the Extras dataframe).</p>
 
Top