Home > Perl, Programming, Tech > Perl and Unicode

Perl and Unicode

Great article on Perl and Unicode. http://www.perlmonks.org/?node_id=551676

The days of just flinging strings around are over. It’s well established that modern programs need to be capable of communicating funny accented letters, and things like euro symbols. This means that programmers need new habits. It’s easy to program Unicode capable software, but it does require discipline to do it right.

There’s a lot to know about character sets, and text encodings. It’s probably best to spend a full day learning all this, but the basics can be learned in minutes.

These are not the very basics, though. It is assumed that you already know the difference between bytes and characters, and realise (and accept!) that there are many different character sets and encodings, and that your program has to be explicit about them. Recommended reading is “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)” by Joel Spolsky, at http://joelonsoftware.com/articles/Unicode.html.

A word of caution,  you don’t really need to care that Perl stores strings internally as UTF-8:

Please, unless you’re hacking the internals, or debugging weirdness, don’t think about the UTF-8 flag at all. That means that you very probably shouldn’t use is_utf8_utf8_on or _utf8_off at all.

Perl’s internal format happens to be UTF-8. Unfortunately, Perl can’t keep a secret, so everyone knows about this. That is the source of much confusion. It’s better to pretend that the internal format is some unknown encoding, and that you always have to encode and decode explicitly.

Advertisements
  1. January 25, 2012 at 10:32 AM

    good post
    thanks.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s