Just some random musing on Unicode from  
Joel's blog:
Back in the semi-olden days, everything was very simple.  EBCDIC was on its way out.  The only 
characters that mattered were good old unaccented English letters, and 
we had a code for them called ASCII which was able to represent every character using a number between 32 and 127. Space was 32, the letter "A" was 65, etc.
The IBM-PC had something that came to be known as the OEM character set 
which provided some accented characters for European languages and a bunch of line drawing characters, 
 horizontal bars, vertical bars, horizontal bars with little 
dingle-dangles dangling off the right side, etc., and you could use 
these line drawing characters to make spiffy boxes and lines on the 
screen, which you can still see running on the 8088 computer at your dry
 cleaners'.
In fact 
as soon as people started buying PCs outside of America all kinds of 
different OEM character sets were dreamed up, which all used the top 128
 characters for their own purposes. For example on some PCs the 
character code 130 would display as é, but on computers sold in Israel 
it was the Hebrew letter Gimel (

), so when Americans would send their résumés to Israel they would arrive as 
r sum
sum s.  (Love this analogy!!)
s.  (Love this analogy!!)
Eventually this OEM free-for-all got codified in the ANSI standard. In 
the ANSI standard, everybody agreed on what to do below 128, which was 
pretty much the same as ASCII, but there were lots of different ways to 
handle the characters from 128 and on up, depending on where you lived. 
These different systems were called 
code pages.
 So for example in Israel DOS used a code page called 862, while Greek 
users used 737. They were the same below 128 but different from 128 up, 
where all the funny letters resided.
Almost every stupid "my website looks like gibberish" or "she can't read
 my emails when I use accents" problem comes down to one naive 
programmer who didn't understand the simple fact that if you don't tell 
me whether a particular string is encoded using UTF-8 or ASCII or ISO 
8859-1 (Latin 1) or Windows 1252 (Western European), you simply cannot 
display it correctly or even figure out where it ends. There are over a 
hundred encodings and above code point 127, all bets are off.
About the author. 
I’m 
Joel Spolsky,
 co-founder of 
Fog Creek Software,
 a 
New York company that proves that 
 you can treat programmers well and still be highly profitable. 
 Programmers get private offices, free lunch, and work 40 
 hours a week. Customers only pay for software if they’re delighted.
  
 We make Trello, 
easy web-based collaboration software, FogBugz, an enlightened 
bug tracking and software development tool, and Kiln, a distributed 
source control system that will blow your socks off.
 
 
 I’m also the co-founder and CEO of 
Stack Exchange. 
More about me.