HTML5 and Security

I’ve been playing pretty heavily with HTML5 for awhile now, and actually my website is written in it. I should say lightly written in it as I didn’t take advantage of the new semantics in defining the page structure(section, article, etc.) except for the <header> element. I used the html5 DOCTYPE more to take advantage of the cool new CSS3 properties (box-shadow, border-radius, etc) and values available. I know, I should use the new page structure and I will, but time was of the essence and I needed to get the site up and running and I didn’t completely understand the semantics of the new elements yet.

Basically my web pages are written in (X)HTML so I’m not too worried at the moment, but as I’m planning to get deeper into 5 and I’m beginning a new web site I’ve become aware of some security issues at www.securityweek.com/top-10-security-threats-html5-black-hat that are arising with some of the new features that 5 provides…

There are plenty of resources out there I’m going to delve into, but the problem with many of them is they start at a level of understanding they assume I understand, which I don’t, so it’s hard for me an entry level grasp on things. I do see I’m going to have to kick things up a few levels in the near future to be able to take full advantage of HTML5 and address the security issues involved.

I’m hoping someone here can give me a beginners explanation of a few of these new features in HTML5:

  1. XMLHttpRequest (XHR)
  2. cross-origin resource sharing (CORS)
  3. webSQL
  4. localstorage

and address a few of the security issues with these components, like:

  1. Cors bypass
  2. Click jacking
  3. HTML5-driven cross-site scripting using tags, events and attributes
  4. Exploiting Browser SQL points

Those CSS3 features are independent of the doctype.

I know, I should use the new page structure

There’s no rush, as the semantics of that structure are not yet recognized by browsers, screen readers etc., and possibly never will be.

I’ve become aware of some security issues

Certainly there are some, but they are being worked on, and it will mostly come down to careful coding to avoid issues.

There are lots of new browser APIs (like some you listed) that loosely fall under the popular banner of HTML5, when in reality they have nothing to do with HTML. They may or may not be useful to you, but you don’t have to use them. A ‘browser API’ basically means a cool new thing that browsers can do, and you utilize each with JavaScript … so HTML doesn’t really come into it.

Wikipedia has a pretty good intro to each of those features. Note that XMLHttpRequest is not an HTML5 feature, but is just the more technical name for what’s popularly known as Ajax.

Thank you.

XMLHttpRequest (XHR)
cross-origin resource sharing (CORS)
webSQL

Note that webSQL was implemented in 2 browsers (webkits and I think Opera) but was dropped in favour of Mozilla’s IndexDB (which is a fancy B-tree and wasn’t really meant to be used by devs directly: Mozilla kinda assumed people would build libraries to interface with that).

XHR2 so far as I know can only do cross-domain when you set your Origins to allow the Origin you’re sending from with CORS. Please don’t set your Access-Control-Allow-Origin to ‘*’. …Unless you’re really in the business of hosting a server whose job it is to make data public.

Click jacking

Is always available, even before HTML5 stuff. Users should protect themselves with something like NoScript. I don’t know what website owners can do to prevent such a thing (though I’m sure there’s tools and tips out there, I just don’t know them).

This is a good topic to discuss, though a lot of things the vendors have to deal with, so for some of these things there’s little devs or users can do.

HTML 5, CSS 3 and JavaScript are all completely different things.

XMLHttpRequest (XHR), cross-origin resource sharing (CORS), webSQL and localstorage are all a part of JavaScript and have nothing whatever to do with any version of HTML. They will work just as well with any version of HTML.

I don’t really know enough about how these new features in HTML5 work to comment intelligently, I’m doing the research as I can. I understood there is a NoScript option in code and with brower plugins, but I also read somewhere (I don’t remember where now) this can be overridden because of new CORS (Ajax requests) features.

The point is, I want to understand exactly what the security issues are so I can head them off. In order to do that I need to understand how these function work, like the new browser storage capabilities. I want to assure my client their customers can submit their credit card and other personal information, reliably click on a link and not be taken to a malicious site etc., on their website and not be compromised.

I don’t know if you’ve read about the newly rolled out 93.7 million dollar government website for the Affordable Health Care Plan sign up, but reportedly it has five major security holes unique to the HTML5 DOCTYPE and the new CORS functions, as the site utilizes Ajax, allowing a hacker to somehow implement redirects to a malicious site. Of course this was on top all the other errors and glitches on the site which is another topic, and also beyond my comprehension for a price tag of that magnitude.

Here is an example of the kind of thing I want to research out. This is a quote from Sheeraj Shaw at a Black Hat security conference, “the XHR object in HTML5 “very powerful,” as it allows a variety of features, such as cross-origins requests and binary uploads and downloads. Attacks include bypass CORS preflight calls, forcing authentication cookies to replay with credentials, internal network scanning and tunneling, information harvesting, and abusing the business logic by uploading binary streams. Users could be tricked into uploading content onto the server, Shah said.”

I refer you all to this article again. http://www.securityweek.com/top-10-security-threats-html5-black-hat

I’ll put this another way then. According to what I’m reading in this article HTML5 interacts differently with JavaScript enhancing its capabilities. Larger local storage ability, enhanced webSQL etc. Is this not true then? Is this statement by Shah just totally bogus then?

“HTML5 faces a number of threats, including cross-site scripting and resource hijacking, Shreeraj Shah, founder of application security vendor Blueinfy, told attendees at the Black Hat security conference in Las Vegas Thursday. The fact that the new Web standard has cross-platform support and integrates several other technologies increases the attack surface, Shah said.”

However both the W3C and WHATWG consider the DOM to be Javascript’s main API, which was incorrectly neglected in the specs in earlier days and HTML is intrinsic to front-end Javascript. These new JS APIs are considered part of “HTML5”, since the the WHATWG doesn’t use HTML5 to mean a version of HTML (instead, its “HTML the living standard”) while the W3C has currently the versions HTML5.0 and an being-edited 5.1, and the APIs are in the spec, even when it’s called a separate spec.

Be very clear on this: the doctype does nothing. It means nothing. These security holes are based on browsers being released and used who happen to have new capabilities built into them. The Doctype has nothing to do with that. People could be using the HTML3.2 IEEF doctype and it wouldn’t change a thing.

The Doctype is used for one thing and one thing only: doctype switching, for browsers who decided to use the existence (or lack thereof) of a doctype to determine which layout rendering system they would use, under the broad assumption that “old pages don’t have doctypes” and “new pages do”.
When people started discussing “HTML5”, they realised browsers weren’t actually reading doctypes. They ignore most of it, anyway. What
<! doctype html>
is, is the shortest possible string of characters we could get away with that browsers used to determine that, yes, there is indeed a doctype so we will render in “standards mode”. Some of us were seriously hoping for anything like
<! doctype foo>
but that wasn’t possible.

Everything else is simply a browser development. Some of them are being released while not including ways for users to turn it off or refuse. This is what’s nice about NoScript: it can block lots of things for you and let you choose what should run in your browser and what not. Web Fonts using @font-face for example. There’s no built-in way I know of to block those, but NoScript can.

Yes, XHR has been made more powerful, and I think one problem web security has always had and will continue to have, is people making and hosting websites without any idea how a lot of this stuff works.

So I think this is an important topic, and I’d also love to hear good answers on “what should Joe website owner do for situation X”, but we should stop calling these “HTML5 technologies” and instead the more correct “new browser technologies”.

Thanks for addressing my post. Everyone’s points are well taken, essential, and important to understanding the relationship/difference between HTML (stuctural semantic markup) and web applications/technologies implemented in browsers. It’s really HTML 101 stuff, but as time goes on it’s easy to forget the basics, especially with all of the new/evolved web technology changing the landscape so often.

It did become a grey area to me with HTML’s new specifications and my inadequate understanding of how it now relates to Web applications. I think part of my misunderstanding of W3C’s HTML5 and WHATWG’s “HTML living standard” is I thought they were reinventing the wheel somehow rather than just helping it evolve, which is the approach they decided to take.

Since XHTML doesn’t have anything except standards mode you can get away with a doctype that is 15 characters shorter than the HTML 5 doctype when you use XHTML 5 - ie a doctype is not needed at all and the first tag in XHTML 5 can be:

<html xmlns=“http://www.w3.org/1999/xhtml”>

(there are other namespaces you can use with XHTML 5 - see http://www.w3.org/TR/2011/WD-html5-20110113/namespaces.html for a list).

Assuming of course your server is sending this out as real XHTML… does this really bypass all browsers’ doctype switching? Even Firefox for example?

The living standard stuff was some decision that there shouldn’t be versions in HTML because various vendors add various things whenever they’re ready, in a constant stream of development. But certainly both groups’ HTML is markup built upon older versions of HTML markup, as well as additions to original JS APIs. The biggest difference seems to me to be that Javascript is now rightly a part of the spec, as it works intrinsically with browsers (and the spec is partially written for authors like us, and partially written for vendors… what should their browsers do with X?) but also with markup.

Since I had to find out why my Firefox kept going to ze googles with my search terms instead of the search engine I specified… I ran across this
new things in Firefox browser
skip down to
Content Security Policy 1.0
(which btw is not a heading, even though someone styled it to look like one… bad bad bad)
this is an example of when a vendor decides how they will deal with CORS and cross-site scripting.

While HTML has quirks mode and standards mode so that HTML 5 has to use the <!doctype html> tag to ensure tha page uses standards mode, there is no quirks mode in real XHTML and with no mode other than standards mode there is no need for a switch (which is all that tag is for in HTML 5).