# MCJ 178: February 2013

Web standards for Web development

Web standards include many interdependent standards and specifications, some of which govern aspects of the Internet, not just the World Wide Web. Even when not web-focused, such standards directly or indirectly affect the development and administration of web sites and web services. Considerations include the interoperability, accessibility and usability of web pages and web sites.
Web standards, in the broader sense, consist of the following:

Recommendations published by the World Wide Web Consortium (W3C)^[2]
Internet standard (STD) documents published by the Internet Engineering Task Force (IETF)
Request for Comments (RFC) documents published by the Internet Engineering Task Force ^[3]
Standards published by the International Organization for Standardization (ISO) ^[4]
Standards published by Ecma International (formerly ECMA) ^[5]
The Unicode Standard and various Unicode Technical Reports (UTRs) published by the Unicode Consortium ^[6]
Name and number registries maintained by the Internet Assigned Numbers Authority (IANA) ^[7]

Web standards are not fixed sets of rules, but are a constantly evolving set of finalized technical specifications of web technologies.^[8] Web standards are developed by standards organizations—groups of interested and often competing parties chartered with the task of standardization—not technologies developed and declared to be a standard by a single individual or company.^[8]
It is crucial to distinguish those specifications that are under development from the ones that already reached the final development status (in case of W3C specifications, the highest maturity level).^[8]

^{From Wikipedia}

The Unicode Consortium

http://www.unicode.org/
All I wanted was a clear cut definition about the difference between Unicode and UTF-8
This opened a whole can of worms!!
Hence, my blogging..............

Unicode

Just some random musing on Unicode from Joel's blog:

Back in the semi-olden days, everything was very simple. EBCDIC was on its way out. The only characters that mattered were good old unaccented English letters, and we had a code for them called ASCII which was able to represent every character using a number between 32 and 127. Space was 32, the letter "A" was 65, etc.
The IBM-PC had something that came to be known as the OEM character set which provided some accented characters for European languages and a bunch of line drawing characters, horizontal bars, vertical bars, horizontal bars with little dingle-dangles dangling off the right side, etc., and you could use these line drawing characters to make spiffy boxes and lines on the screen, which you can still see running on the 8088 computer at your dry cleaners'.
In fact as soon as people started buying PCs outside of America all kinds of different OEM character sets were dreamed up, which all used the top 128 characters for their own purposes. For example on some PCs the character code 130 would display as é, but on computers sold in Israel it was the Hebrew letter Gimel (

), so when Americans would send their résumés to Israel they would arrive as
r

sum

s. (Love this analogy!!)

Eventually this OEM free-for-all got codified in the ANSI standard. In the ANSI standard, everybody agreed on what to do below 128, which was pretty much the same as ASCII, but there were lots of different ways to handle the characters from 128 and on up, depending on where you lived. These different systems were called code pages. So for example in Israel DOS used a code page called 862, while Greek users used 737. They were the same below 128 but different from 128 up, where all the funny letters resided.

Almost every stupid "my website looks like gibberish" or "she can't read my emails when I use accents" problem comes down to one naive programmer who didn't understand the simple fact that if you don't tell me whether a particular string is encoded using UTF-8 or ASCII or ISO 8859-1 (Latin 1) or Windows 1252 (Western European), you simply cannot display it correctly or even figure out where it ends. There are over a hundred encodings and above code point 127, all bets are off.

About the author.

I’m Joel Spolsky, co-founder of Fog Creek Software, a New York company that proves that you can treat programmers well and still be highly profitable. Programmers get private offices, free lunch, and work 40 hours a week. Customers only pay for software if they’re delighted. We make Trello, easy web-based collaboration software, FogBugz, an enlightened bug tracking and software development tool, and Kiln, a distributed source control system that will blow your socks off. I’m also the co-founder and CEO of Stack Exchange. More about me.

Kyle and Kurt Busch

Just about NASCAR!!

Friday, February 01, 2013