XML isn't a mark-up language.
secretGeek .:dot Nuts about dot Net:.
home .: about .: sign up .: sitemap .: secretGeek RSS

XML isn't a mark-up language.

There. I've said it. XML isn't a mark-up language. I've said it twice. I must mean it. I'd better explain myself, quick.

This thought has been brewing inside my brain for too long now.

XML has always been called a mark-up language, and it's derived from SGML, which is also referred to as a mark up language.

And maybe if enough people speak an untruth often enough, it becomes a sort of quasi-truth. A pseudo factoid, maybe.

My point is... [continues... long]

My point is that while XML is similar to a mark-up language, it is in fact far more brutal than one. It oversteps the line and is, instead, an exclusive document formatting language. In the process it has lost a lot of the benefits of mark-up.

The idea of mark-up, is elegant, light and beautiful when compared to these heavy 'document formatting' techniques we've taken on instead.

Consider a piece of text, that is not marked up:

Take your god damn hands offa her.

If we want to "mark it up" we can use special codes of some sort to impose a second meaning on it:

Maybe we'd use slash characters:

Take your //god damn// hands offa her.

Or in gaXml we might say:

Take your <em/god damn/> hands offa her.

But to mark this text up using xml, we have to be a lot more brutal:

<?xml version="1.0" >
<quote>
Take your <em>god damn</em> hands offa her.
</quote>

See in xml you can't simply mark up the parts of text that we wish to give a special meaning to. We must give a special meaning to every single part of the document. Even if you don't know anything special about the rest of the document. Weird isn't it?

For that reason, XML can never just adding marks/meaning to portions of text: instead it becomes The One True Meaning Of The Document.

So, rather than the markup being placed inside the text, the text is dragged, kicking and screaming into the markup.

Don't get me wrong: I do love XML. I'm just starting to hallucinate about the world beyond it.

The great benefit of XML is that it's easy to write an xml parser, because xml documents are so very machine-readable.

But by using such an all-encompassing format, we destroy any machine-readability that the document might otherwise have had.

For example, a C# document is perfectly machine readable. (Assuming the machine has a C# compiler). Ditto a python document, a ruby document, or a CSS file.

But when we mark up that document using XML, we have to use an all or nothing approach.

Either we embed the C# file inside an XML document, and it can no longer be compiled by a C# compiler.

Or we embed 'xml-like' comments in the C# document, and (since they're not quite XML) they can't be read by any XML parser on the planet. We need a specific "C# with embedded XMl-like stuff" pre-parser to do the job for us. It doesn't matter how many XML parsers there are, cause there aren't a lot of 'pre-parsers' available.

Life After Xml

I'd like to see a simpler mark-up language, that allows for more flexible documents. (They'd still have strict heirarchy and well-formedness, like XML).

I've tinkered with names for it... XXML, X2ML 2XML, iXML,... for this blog entry i'll stick with the name iXml, though i'm not at all attached to it.

XML would be just a sub set of iXML. In other words we'd need to rewrite the genealogy of XML.

Instead of following this old blood line:

<sgml>
  <html />
  <xml>
    <xsl />
    {etc.}
  </xml>
</sgml>

(i.e. xml is a child of sgml, and xsl is a child of xml, etc...)

We'd change it to be:

<iXml>
  <xml>
    <xsl />
  </xml>
  <css />
  <c />
  <c# />		
  <ruby />		
    {etc.}
</iXml>

(i.e. xml is a child of iXml. Css grammar is also a descendent of iXml... so are many other formal grammars, provided they are strict....)

The idea of iXml is that it doesn't need to ruin a document's existing machine-readability (or human readability for that matter).

Anonymous Structure

Thanks to thinking about Linq, anonymous types and some other stuff, i've thought of something else that XML lacks, because of its verbosity. This could never be pushed into XML, but could fit into iXml nicely.

Consider this nice piece of CSV:

1,2,3,4

How would that look in XML?

<?xml version="1.0"?>
<Numbers>
<Number>1</Number>
<Number>2</Number>
<Number>3</Number>
<Number>4</Number>
</Numbers>

Now the official excuse for this sort of monstrosity goes back to the XML spec, where they say:

Terseness in XML is of minimal importance.

But the problems with the above XML example, are not just it's verbosity, but that it forces us to invent meta-data. The list in its original form didn't impose any kind of type, or name upon the data.

So here's a different feature of iXml: it allows for anonymous "structure-only" markup!

<>
<>1</>
<>2</>
<>3</>
<>4</>
</>

So in the above case we can see that the data values 1, 2, 3 and 4 are all siblings. We don't need to invent an element name for them, if all we want to impose on them is a heirarchical set of relationships.

(I guess a rule would also be needed to define an empty anonymous element, in the example above [since '</>' is already taken to mean an empty closing element. I figure an empty anon element would be written: '< />'. Can't think what you'd use it for... a place holder of sorts i guess.)

Polyglotics

A polyglot is a person who speaks more than one language. In programming, a polyglot is a document that is valid in more than one language.

It sounds like a dangerous and bad thing. It sounds like a maintenance nightmare.

in fact, polyglotics is already all around us.

A static html document can be polyglotic: combining html and css in a single document.

A dynamic html file will combine three different syntaxes, each delimited in different ways: html, javascript and css.

A worst case scenario document might combine: html, javascript, css, asp, vba, embedded sql, regular expressions, comma separated values, and embedded xml all in the one document!

It's a pity that XML, because of it's over-reaching design, doesn't allow for polyglotics.

An iXml parser would be able to separate those threads out, treat them separately, and even allow different people to work on each of those sections simulataneously...

Alright, i've blurted out too much about my inner-inklings now. Finally here's a rewrite of the designs goals of XML, appropriated for iXml.

  1. iXML shall be straightforwardly usable over the Internet. Like duh.
  2. iXML shall support WAY MORE applications than XML.
  3. XML shall be compatible with SGML. (screw sgml!)
  4. It shall be fairly easy to write programs which process documents containing iXML markup.
    [Not as easy as XML... but who needs 1,000,000 parsers? A dozen good ones would suffice.]

  5. The number of optional features in iXML will be much higher than in XML... deal with it.
  6. documents containing iXML markup should be WAY MORE human-legible and reasonably clear than those dodgy XML docs!
  7. The iXml design should be prepared quickly.
  8. The design of iXml shall be sorta formal and sorta concise.
  9. iXml marking shall be easier to create than XML with it's dodgy "let's take over the entire document" philosophy.
  10. Terseness in iXml markup is of relative importance.

Tim Bray, I await your reply. ;-)





'asdfasdf' on Sat, 07 Oct 2006 22:11:19 GMT, sez:


Mmm.. You have the somewhat self-important and very redundant style of "Pill" Limbaugh, and managed to include about as much thought in your posting.

Interesting ideas are:
(a) not mundane
(b) would better be expressed tersely and logically, in a language consisting of pre-filtered substance rather than oceans of water

The first part on the philosophy of XML is scatterbrained, the second, on "polyglotics" is not only obvious, it's long been in use, and without much fanfare.



'lb' on Sun, 08 Oct 2006 18:53:11 GMT, sez:

>The first part on the philosophy of XML is >scatterbrained

yeh, you got me there.

>the second, on "polyglotics" is not only >obvious, it's long been in use, and >without much fanfare

care to provide an example?



'lb' on Fri, 18 May 2007 04:32:03 GMT, sez:

>Interesting ideas ... would better be
>expressed tersely and logically

you're right -- this is my gripe with XML. It's not capable of terse expressions.




name


website (optional)


enter the word:
 

comment (HTML not allowed)


All viewpoints welcome. But the right to delete any post for any reason is reserved. Don't make me do it. Aim for constructiveness. Comments may be republished, emailed to your loved ones or printed and used as toilet paper. Also, I get particularly nasty on comment spam. It's not worth even trying to post comment spam here -- your html is escaped, and your links are given a rel='nofollow'. By attempting to post a comment, you understand that if the comment is considered spam, at my absolute discretion, your IP address may be used as the target of a prolonged distributed denial of service attack. Your electricity might suddenly stop working. Your car tyres will go mysteriously flat. You will suffer permanent hairloss. Your dreams will be filled with terrifying monsters. And in any case I reserve the right to record and publish your IP address.

 

TimeSnapper is a life analysis system that stores and plays-back your computer use. It makes timesheet recording a breeze, helps you recover lost work and shows you how to sharpen your act.

 

NimbleText - FREE text manipulation and data extraction

NimbleText is a Powerful FREE Tool

Use it for:

  • extracting data from text
  • manipulating text
  • generating code

It makes you look awesome. Use it right now! Go on! Hurry! Don't walk, run!

 

Articles

Mind-boggling Demo of New Gaming Genre, aka Folder-Based Hangman, aka Fun with Recursion Mind-boggling Demo of New Gaming Genre, aka Folder-Based Hangman, aka Fun with Recursion
Got CSV in your javascript? Use agnes. Got CSV in your javascript? Use agnes.
I went to write down a book name and founded an internet empire instead. I went to write down a book name and founded an internet empire instead.
NimbleText: Origins NimbleText: Origins
The Windows 8 Mullet The Windows 8 Mullet
Cosby: spontaneous striped background generator Cosby: spontaneous striped background generator
Slides from WDCNZ: Live Coding Asp.net MVC3 Slides from WDCNZ: Live Coding Asp.net MVC3
MVC 3, MVC 3, "Third Times a Charm" references
Custom Errors in ASP.Net MVC: It couldn't be simpler, right? Custom Errors in ASP.Net MVC: It couldn't be simpler, right?
Anatomy of a Domain Hijacking, part 2: The Website Who Came In From The Cold Anatomy of a Domain Hijacking, part 2: The Website Who Came In From The Cold
Anatomy of a Domain Hijacking, part 1 Anatomy of a Domain Hijacking, part 1
secretGeek.net domain has been stolen. The site may go down. secretGeek.net domain has been stolen. The site may go down.
Boring article: 'untrusted domain' issue with SQL Server. Boring article: 'untrusted domain' issue with SQL Server.
Coding While You Commute Coding While You Commute
Test Driven Dentistry Is A Good Thing Test Driven Dentistry Is A Good Thing
The 'less crashy' release of NimbleText The 'less crashy' release of NimbleText
Rethinking Toolbars in Visual Studio (or any IDE) Rethinking Toolbars in Visual Studio (or any IDE)
Where shall we have lunch? Where shall we have lunch?
Setting up email for your microIsv Setting up email for your microIsv
The NO Visual Studio movement: Compiling .net projects in Notepad++ The NO Visual Studio movement: Compiling .net projects in Notepad++
ZeroOne: the editor for programmers who think in binary ZeroOne: the editor for programmers who think in binary
Mercurial workflow for personal projects (with a .net bias) Mercurial workflow for personal projects (with a .net bias)
I see you're using vim. Let me fix that for you. I see you're using vim. Let me fix that for you.
The worst recruitment spam I've ever read The worst recruitment spam I've ever read
A thank you I forgot to say A thank you I forgot to say
My new product, NimbleText, is live My new product, NimbleText, is live
Grabbing the free songs of Jonathan Coulton (with Powershell) Grabbing the free songs of Jonathan Coulton (with Powershell)
Using NimbleSet to compare lists Using NimbleSet to compare lists
Wanted: Wiki Lists (dot org) Wanted: Wiki Lists (dot org)
DOS on Dope: The last MVC web framework you'll ever need DOS on Dope: The last MVC web framework you'll ever need
JSON Query Languages: 5 special purpose editors JSON Query Languages: 5 special purpose editors
What then, is b? What then, is b?
SQLike: A simple editor SQLike: A simple editor
Yet Another BizPlan Generator. Yet Another BizPlan Generator.
HOT GUIDS: A hot or not site for guids HOT GUIDS: A hot or not site for guids
How does life get better? One tiny hack at a time. How does life get better? One tiny hack at a time.
24 things to do, and 100 things *not* to do (yet) for building a MicroISV 24 things to do, and 100 things *not* to do (yet) for building a MicroISV
Venture capital won't kill Jeff Atwood, it will only make him Jeffer. Venture capital won't kill Jeff Atwood, it will only make him Jeffer.
A handy workflow image for newbie mercurial users A handy workflow image for newbie mercurial users
Fractal Feedback, a diversion into recreational programming Fractal Feedback, a diversion into recreational programming
Hump-Jumping: How the Education of Computer Science can be Saved, err, maybe. Hump-Jumping: How the Education of Computer Science can be Saved, err, maybe.
Suggested User Experience Improvements for DiffMerge Suggested User Experience Improvements for DiffMerge
SQL Style Extensions for C# SQL Style Extensions for C#
The Movie Hollywood (And My Wife) Doesn't Want You To See: Weekend at Jacko's The Movie Hollywood (And My Wife) Doesn't Want You To See: Weekend at Jacko's
Sysi: the ultimate administrators toolkit Sysi: the ultimate administrators toolkit

Archives .: secretGeek :: Complete Archives
TimeSnapper -- Automated Screenshot Journal TimeSnapper.com    
Version 3.3: true productivity boost

Next Action NextAction
Managing the top of your mind

NimbleText -- World's Simplest Code GeneratorNimbleText -- World's Simplest Code Generator, Text Manipulator, Data Extractor

25 steps for building a Micro-ISV 25 steps for building a Micro-ISV
3 minute guides -- babysteps in new technologies: powershell, JSON, watir, F# 3 Minute Guide Series
Universal Troubleshooting checklist Universal Troubleshooting Checklist
Top 10 SecretGeek articles Top 10 SecretGeek articles
ShinyPower (help with Powershell) ShinyPower
Now at CodePlex

Realtime CSS Editor, in a browser RealTime Online CSS Editor
Gradient Maker -- a tool for making background images that blend from one colour to another. Forget photoshop, this is the bomb. Gradient Maker


[powered by Google] 


How to be depressed How to be depressed
You are not inadequate.



Recommended Reading


the little schemer


The Best Software Writing I
The Business Of Software (Eric Sink)

Recommended blogs

Jeff Atwood
Joseph Cooney
Phil Haack
Scott Hanselman
Julia Lerman
Rhys Parry
Joel Pobar
Thomas White
OJ Reeves
Eric Sink

Aggregated Links

proggit
dzone
hacker news
dot net kicks

Human Link Machines

interesting finds
a continuous learner's weblog
arjan's world
weekly link post

LinkedIn profile
LogEnvy - event logs made sexy
Computer, Unlocked. A rapid computer customization resource
PC Smart Buys - Computer Hardware in Australia
 
home .: about .: sign up .: sitemap .: secretGeek RSS .: © Leon Bambrick 2006 .: privacy

home .: about .: sign up .: sitemap .: RSS .: © Leon Bambrick 2006 .: privacy