Pages

Sunday, September 26, 2010

How To Do Sorting In Languages Other Than English In Java?

Sorting is always a tricky game in any programming language and it is responsible for 50-60 percent of the total CPU time for any application. We all have our native language like Hindi, Chinese, Japanese, French and so many. Most of the time world deals with sorting of Alphabets or English words but give a eye on other languages which is growing fast and off course today we are talking about internationalization.
I am showing you a typical sorting of French word and the blunder associated with it. These are some of the common French words: 

String[] names = {"fácil", "facil", "fast","Où", "êtes-vous", "spécifique", "specific", "ou"};

and here is the typical sorting code:

String[] names = {"fácil", "facil", "fast","Où", "êtes-vous", "spécifique", "specific", "ou"};
List list = Arrays.asList(names);
Collections.sort(list);
Iterator itr = list.iterator();
while(itr.hasNext()) {
System.out.print(itr.next()+ " ");
}

And the result:
Actul : Où facil fast fácil ou specific spécifique êtes-vous
Expected êtes-vous facil fácil fast Où ou specific spécifique

which is completely wrong according to French Rules. Because sorting is simply going via UNICODE rules not by French rules.

Now remedy: Java gives us a class called Collator class in java.text Package which takes care of locale while sorting. Here goes the code:


import java.text.*;

import java.util.*;


class CollatorTest {
public static void main(String[] args) {
String[] names = {"fácil", "facil", "fast", "Où", "êtes-vous", "spécifique", "specific", "ou"};
List list = Arrays.asList(names);
Collections.sort(list);
Iterator itr = list.iterator();
while (itr.hasNext()) {
System.out.print(itr.next() + " ");
}
Locale[] loc = Collator.getAvailableLocales();
Collator myCollator = Collator.getInstance(new Locale("fr"));
myCollator.setStrength(Collator.PRIMARY);
Collections.sort(list, myCollator);
itr = list.iterator();
System.out.println("");
while (itr.hasNext()) {
System.out.print(itr.next() + " ");
}
myCollator.setStrength(Collator.TERTIARY);
Collections.sort(list, myCollator);
itr = list.iterator();
System.out.println("");
while (itr.hasNext()) {
System.out.print(itr.next() + " ");
}
} }


And here is the result:


Où  facil  fast  fácil  ou  specific  spécifique  êtes-vous


êtes-vous  facil  fácil  fast  Où  ou  specific  spécifique


êtes-vous  facil  fácil  fast  ou  Où  specific  spécifique

No comments:

Post a Comment