OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

icu: Sort strings based on 2 different locales

  • Thread starter Thread starter saeedgnu
  • Start date Start date
S

saeedgnu

Guest
As you probably know, the order of alphabet in some (maybe most) languages is different than their order in Unicode. That's why we may want to use icu.Collator to sort, like this Python example:

Code:
from icu import Collator, Locale
collator = Collator.createInstance(Locale("fa_IR.UTF-8"))
mylist.sort(key=collator.getSortKey)

This works perfectly for Persian strings. But it also sorts all Persian strings before all ASCII / English strings (which is the opposite of Unicode sort).

What if we want to sort ASCII before this given locale?

Or ideally, I want to sort by 2 or multiple locales. (For example give multiple Locale arguments to Collator.createInstance)

If we could tell collator.getSortKey to return empty bytes for other locales, then I could create a tuple of 2 collator.getSortKey() results, for example:

Code:
from icu import Collator, Locale

collator1 = Collator.createInstance(Locale("en_US.UTF-8"))
collator2 = Collator.createInstance(Locale("fa_IR.UTF-8"))

def sortKey(s):
    return collator1.getSortKey(s), collator2.getSortKey(s)

mylist.sort(key=sortKey)

But looks like getSortKey always returns non-empty bytes.
<p>As you probably know, the order of alphabet in some (maybe most) languages is different than their order in Unicode. That's why we may want to use <code>icu.Collator</code> to sort, like this Python example:</p>
<pre><code>from icu import Collator, Locale
collator = Collator.createInstance(Locale("fa_IR.UTF-8"))
mylist.sort(key=collator.getSortKey)
</code></pre>
<p>This works perfectly for Persian strings. But it also sorts all Persian strings before all ASCII / English strings (which is the opposite of Unicode sort).</p>
<p>What if we want to sort ASCII before this given locale?</p>
<p>Or ideally, I want to sort by 2 or multiple locales. (For example give multiple <code>Locale</code> arguments to <code>Collator.createInstance</code>)</p>
<p>If we could tell <code>collator.getSortKey</code> to return empty bytes for other locales, then I could create a tuple of 2 <code>collator.getSortKey()</code> results, for example:</p>
<pre><code>from icu import Collator, Locale

collator1 = Collator.createInstance(Locale("en_US.UTF-8"))
collator2 = Collator.createInstance(Locale("fa_IR.UTF-8"))

def sortKey(s):
return collator1.getSortKey(s), collator2.getSortKey(s)

mylist.sort(key=sortKey)
</code></pre>
<p>But looks like <code>getSortKey</code> always returns non-empty bytes.</p>
 

Latest posts

P
Replies
0
Views
1
Paras Chouhan
P
Top