OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Advice on data structure for an analysis and visualisation tool

  • Thread starter Thread starter user1505631
  • Start date Start date
U

user1505631

Guest
I am writing a browser-based tool to manipulate and visualise data (with D3.js). Currently, I store data in a JSON format, where each table is an object and columns are arrays. eg:

Code:
{"data":{"timeString":{"name":"time","type":"time","data":["6/21/2024, 12:00:00 AM","6/21/2024, 12:15:00 AM"], "timeFormat":"M/D/YYYY hh:mm:ss AP","timeMins":["0.0000","0.2500"], "value0":{"name":"value0","type":"value","data":[5,1],"value1":{"name":"value1","type":"value","data":[4,2]}}}}}

Because I want to manipulate the data, I have chosen to make a manipulated set also, eg:

Code:
{"data":{"timeString":{"name":"time","type":"time","data":["6/21/2024, 12:00:00 AM","6/21/2024, 12:15:00 AM"], "timeFormat":"M/D/YYYY hh:mm:ss AP","timeMins":["0.0000","0.2500"], "value0":{"name":"value0","type":"value","data":[5,1],"value1":{"name":"value1","type":"value","data":[4,2],"processSteps":[{"add":[5]}],"processedData":[9,7]}}}}}

This is an overhead of memory, but improves performance. Likewise, for each graph, I have a similar pattern of data. This is another memory overhead, but improves performace.

Some of my datasets are now increasing in size to approx 50k rows. I've had to modify array functions (like Math.min(...arr)) because maximum call stack size errors crept in.

This has made me reconsider if my approach is the most efficient. I have thought about a single JSON, like:

Code:
{"data":{"names":["time","value0","value1"],"types":["time","value","Value"], "timeFormat":["D/M/YYYY hh:mm:ss AP",-1,-1],"data":[{"time":"6/21/2024, 12:00:00 AM","timeMins":0.0000, "value0":5, "value1":4}, {"time":"6/21/2024, 12:15:00 AM","timeMins":0.2500, "value0":1, "value1":2}],"processSteps":{"value0":[{"add":[5]}]},"processedData":[{"value0":9},{"value0":7}]}}

I've considered removing the duplicated, processed, data but haven't yet run into memory issues and the efficiency saving is large, given a visualisation re-render doesn't need to recalculate the processes.

I've thought about also using indexedDB, but that seems to add a layer of complexity for not much apparent gain.

Does anyone have advice on how best to architect the data for memory and performance in this use case?

<p>I am writing a browser-based tool to manipulate and visualise data (with D3.js). Currently, I store data in a JSON format, where each table is an object and columns are arrays. eg:</p>
<pre><code>{"data":{"timeString":{"name":"time","type":"time","data":["6/21/2024, 12:00:00 AM","6/21/2024, 12:15:00 AM"], "timeFormat":"M/D/YYYY hh:mm:ss AP","timeMins":["0.0000","0.2500"], "value0":{"name":"value0","type":"value","data":[5,1],"value1":{"name":"value1","type":"value","data":[4,2]}}}}}
</code></pre>
<p>Because I want to manipulate the data, I have chosen to make a manipulated set also, eg:</p>
<pre><code>{"data":{"timeString":{"name":"time","type":"time","data":["6/21/2024, 12:00:00 AM","6/21/2024, 12:15:00 AM"], "timeFormat":"M/D/YYYY hh:mm:ss AP","timeMins":["0.0000","0.2500"], "value0":{"name":"value0","type":"value","data":[5,1],"value1":{"name":"value1","type":"value","data":[4,2],"processSteps":[{"add":[5]}],"processedData":[9,7]}}}}}
</code></pre>
<p>This is an overhead of memory, but improves performance. Likewise, for each graph, I have a similar pattern of data. This is another memory overhead, but improves performace.</p>
<p>Some of my datasets are now increasing in size to approx 50k rows. I've had to modify array functions (like Math.min(...arr)) because maximum call stack size errors crept in.</p>
<p>This has made me reconsider if my approach is the most efficient. I have thought about a single JSON, like:</p>
<pre><code>{"data":{"names":["time","value0","value1"],"types":["time","value","Value"], "timeFormat":["D/M/YYYY hh:mm:ss AP",-1,-1],"data":[{"time":"6/21/2024, 12:00:00 AM","timeMins":0.0000, "value0":5, "value1":4}, {"time":"6/21/2024, 12:15:00 AM","timeMins":0.2500, "value0":1, "value1":2}],"processSteps":{"value0":[{"add":[5]}]},"processedData":[{"value0":9},{"value0":7}]}}
</code></pre>
<p>I've considered removing the duplicated, processed, data but haven't yet run into memory issues and the efficiency saving is large, given a visualisation re-render doesn't need to recalculate the processes.</p>
<p>I've thought about also using indexedDB, but that seems to add a layer of complexity for not much apparent gain.</p>
<p>Does anyone have advice on how best to architect the data for memory and performance in this use case?</p>
 

Latest posts

Top