OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

How to parse and capture any measurement unit

  • Thread starter Thread starter Domino
  • Start date Start date
D

Domino

Guest
In my application, users can customize measurement units, so if they want to work in decimeters instead of inches or in full-turns instead of degrees, they can. However, I need a way to parse a string containing multiple values and units, such as 1' 2" 3/8. I've seen a few regular expressions on SO and didn't find any which matched all cases of the imperial system, let alone allowing any kind of unit. My objective is to have the most permissive input box possible.

So my question is: how can I extract multiple value-unit pairs from a string in the most user-friendly way?



I came up with the following algorithm:

  1. Check for illegal characters and throw an error if needed.
  2. Trim leading and trailing spaces.
  3. Split the string into parts every time there's a non-digit character followed by a digit character, except for .,/ which are used to identify decimals and fractions.
  4. Remove all spaces from parts, check for character misuse (multiple decimal points or fraction bars) and replace '' with ".
  5. Split value and unit-string for each part. If a part has no unit:
    • If it is the first part, use the default unit.
    • Else if it is a fraction, consider it as the same unit as the previous part.
    • Else if it isn't, consider it as in, cm or mm based on the previous part's unit.
    • If it isn't the first part and there's no way to guess the unit, throw an error.
  6. Check if units mean something, are all of the same system (metric/imperial) and follow a descending order (ft > in > fraction or m > cm > mm > fraction), throw an error if not.
  7. Convert and sum all parts, performing division in the process.

I guess I could use string manipulation functions to do most of this, but I feel like there must be a simpler way through regex.



I came up with a regex:
((\d+('|''|"|m|cm|mm|\s|$) *)+(\d+(\/\d+)?('|''|"|m|cm|mm|\s|$) *)?)|((\d+('|''|"|m|cm|mm|\s) *)*(\d+(\/\d+)?('|''|"|m|cm|mm|\s|$) *))

It only allows fractions at the end and allows to place spaces between values. I've never used regex capturing though, so I'm not so sure how I'll manage to extract the values out of this mess. I'll work again on this tomorrow.

<p>In my application, users can customize measurement units, so if they want to work in decimeters instead of inches or in full-turns instead of degrees, they can. However, I need a way to parse a string containing multiple values and units, such as <code>1' 2" 3/8</code>. I've seen a few regular expressions on SO and didn't find any which matched all cases of the imperial system, let alone allowing any kind of unit. My objective is to have the most permissive input box possible.</p>

<p>So my question is: <strong>how can I extract multiple value-unit pairs from a string in the most user-friendly way?</strong></p>

<hr>

<p>I came up with the following algorithm:</p>

<ol>
<li>Check for illegal characters and throw an error if needed.</li>
<li>Trim leading and trailing spaces.</li>
<li>Split the string into parts every time there's a non-digit character followed by a digit character, except for .,/ which are used to identify decimals and fractions.</li>
<li>Remove all spaces from parts, check for character misuse (multiple decimal points or fraction bars) and replace <code>''</code> with <code>"</code>.</li>
<li>Split value and unit-string for each part. If a part has no unit:
<ul>
<li>If it is the first part, use the default unit.</li>
<li>Else if it is a fraction, consider it as the same unit as the previous part.</li>
<li>Else if it isn't, consider it as in, cm or mm based on the previous part's unit.</li>
<li>If it isn't the first part and there's no way to guess the unit, throw an error.</li>
</ul></li>
<li>Check if units mean something, are all of the same system (metric/imperial) and follow a descending order (ft > in > fraction or m > cm > mm > fraction), throw an error if not.</li>
<li>Convert and sum all parts, performing division in the process.</li>
</ol>

<p>I guess I could use string manipulation functions to do most of this, but I feel like there must be a simpler way through regex.</p>

<hr>

<p>I came up with a regex:<br>
<code>((\d+('|''|"|m|cm|mm|\s|$) *)+(\d+(\/\d+)?('|''|"|m|cm|mm|\s|$) *)?)|((\d+('|''|"|m|cm|mm|\s) *)*(\d+(\/\d+)?('|''|"|m|cm|mm|\s|$) *))</code></p>

<p>It only allows fractions at the end and allows to place spaces between values. I've never used regex capturing though, so I'm not so sure how I'll manage to extract the values out of this mess. I'll work again on this tomorrow.</p>
 

Latest posts

Top