Monday, September 16, 2013

Java String.toUpperCase() ...what the?

Just the other day I ran into a strange strange bug. I had a string of characters that I had to build. And as a delimiter the host system I was communicating with used char 254. Anyway I build out my string and sent it to the host. On the host I was receiving char 222 as the delimiter! After scratching my head and looking into it deeper it seemed odd that

hex : FE, binary: 11111110

was turning into

hex: DE, binary: 11011110

I tried the Locale.getDefault() and Locale.ENGLISH to no avail.

Could it be that the implementation of String.toUpperCase has a mask for ALL chars except specific hard coded ones? I have no clue, but I wrote the following to get around the problem:

public static String toUpperCase(String input) {
        
        char[] chars = input.toCharArray();
        
        
        for(int i = 0; i < chars.length; ++i ) {
            
            if( chars[i] > 96 && chars[i] < 123 ) {
                
                chars[i] &= 223;
            }
            
        }
        
        return new String(chars);
        
    }

lower case ASCII values a-z are 97-122 so I just bitwise AND it them with 223 (11011111) to get the upper case equivalents. UPDATE: Seems like Java is working perfectly well and it was my understanding. Char 254 is a real character. It's uppercase is char 222! See the following: http://www.scism.lsbu.ac.uk/jfl/Appa/appa4.html

No comments:

Post a Comment