DOMDocument script tag clossing issue

svcghost · April 3, 2011, 11:03am

Hey guys,

I am using DOMDocument to grab HTML and alter it then spit it back out. For HTML pages that have script tags with a nested “</” in it that is not a “</script>”, the script tag will be ended prematurely. Are there ways to get around this when using DOMDocument?

I tried searching far and wide and nobody seems to have found a solution.

For example

<script>document.write('<p>hello</p>');</script>

will be outputted as:

<script>document.write('<p>hello</script>

thus ending prematurely due to the “</” in the “</p>” ending tag

svcghost · April 3, 2011, 11:35am

So I used the HTML5 parser now instead of DOMDocument, and it seems to be working for the script tag issues. But it doesn’t load every page. Why does it not load some pages? Does it expect perfect HTML markup or something? How can I get around that?