What happened to my HTML? (a.k.a. Byte-Order Mark)

I am having problems validating my most recent projects locally developed using XAMPP. Prevoiusly, I did not have any validation issues with my beginning html (see sample screenshot from Firefox):

but now my browser is showing this problem (different project):

and the validator at w3.org is giving me this as the first of 5 errors:
> Non-space characters found without seeing a doctype first. Expected .

and this comment:

Byte-Order Mark found in UTF-8 File.

  The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported 

I have no idea what is suddenly causing this, but I am ready to launch a simple event registration system that I made for a client, and would like to clear this issue up first if possible.

If it is relevant, for both these samples, I used Notepad++ for my text editor.

Does anyone have any idea what is causing the BOM?

Is Notepad++ set to UTF-8? Looks like the editor is causing this.

I think so too. I checked and it was set to ANSII, so I changed it to UTF-8. My problem now is that even when I resave all my files with the new setting, I can’t get rid of this BOM issue.

But the thing is - the settings were the same for all my other projects in the past which didn’t show this problem. Why would it suddenly in the past week start?

Can you give us the code that’s giving you teh BOM? Paste it in here. I usually experience quotations (") as the BOM that’s not UTF-8 that’s hard to identify. Give us the code that’s throwing the validator.

I gave you what is tripping the validator. It won’t look past the opening body tag because it is the doctype that is initiating all the issues. I am using php includes for the header and footer, as well as my config file where I load the helper functions and have my class autoloader.

There are no quotes or other symbols in the code. The one application that is causing this error is a php-based registration system, and the other application is a my php-based collection of business tools.

Is there anything I can do with my files to remove the BOM?

I can’t copy/paste that into the validator though. That’s a picture.

In Notepad++, right next to Encode in UTF-8, there’s an Encode in UTF-8 without BOM option.

@Jeff_Mott, I tried that, and then tried saving all my files again. Is that how I can convert them? I couldn’t get anything to change.

@RyanReese, there were five errors. The one I quoted here was the one that tripped the other four (all regarding the head of the document). The validator would not go any further than that.

The last error listed was

Line 5, Column 30:
Cannot recover after last error. Any further errors will be ignored.
<html lang="en" class="no-js">

Here is one sample, if you think it will tell you something it’s not telling me.
You might notice something different, I guess.

<!DOCTYPE html>

<html>

<html lang="en" class="no-js">
<head>
	<meta charset = "utf-8">
	<title>Environmental Event Registration Form</title>
	<link rel="stylesheet" type="text/css" href="../css/style.css" />
	<script type="text/javascript" src="js/jquery-1.4.4.min.js"></script>
	<script type="text/javascript" src="formvalidation.js"></script>
	<script type="text/javascript" src="gen_validator4.js"></script>
</head>

<body class="admin login-form">

	<div id="container">
		<div id = "wrapper">
			<header>
				<img src="../images/logo.png" id="logo" width="156" height="150" alt="MBC logo" />
				<h2>Registration for Environmental Events</h2>
				<h1 class="heading-blue">Admin Area</h1>
			</header>
			<nav>
				<ul>
					<li><a href="events.php">Events List</a></li>
					<li><a href="event_form.php">Add an Event</a></li>
					<li><a href="edit_event.php">Edit an Event</a></li>
					<li><a href="index.php?action=logout">Log Out</a></li>
				</ul>
			</nav>
<div id = "main-content">
	<div id="event-form">
		<h2>Please Log In</h2>
		     <form action="index.php" method="post" id="login">
			<p>
				<label for="username" class="username">Username:</label>
				<input type="text" name="username" id="username" value="">
			</p>
			<p>
				<label for="password" class="password">Password:</label>
				<input type="password" name="password" id="password" value="">
			</p>
			<p>
				<input type="submit" name="login" value="Log In">
			</p>	
		</form>
	</div> <!-- end of #event-form -->	 

</div> <!-- end of #main-content -->	

 	</div>  <!-- end of #wrapper -->
	<footer>
		<p>copyright &copy; 2015  Event Registration</p>
	</footer>
</div>  <!-- end of #container -->

</body>

</html>

Hi, WebMachine.

Can you zip the file that is failing validation and upload/attach it here? Perhaps @RyanReese, or @Jeff_Mott, (one of us) will be able to find the problem with the BOM (assuming it is not being imported by PHP).

Yes, I imagine Discourse strips out whitespace and posted content will be useless for detecting a BOM unless it’s attached.

@WebMachine my thought process was I could more easily find the BOM.

However as others stated, Discourse probably filters this.

A zip file will be needed.

It’s useless to try and figure out this issue just by looking at screenshots.

I can send a zip file - will do it later today when I get a break from meetings and work, but in the meantime I would rather you tell me some strategies I could use to find it myself. I have never run across this problem before, and would like to be able to handle it myself if I ever run across it again.

I used a script to see which file(s) contained the BOM - it turns out it’s in all of my files for the application. Is there a php script that could strip out the BOM?

Maybe I should move this thread to the PHP forum?

I moved it for you, and am giving you this in the mean time.

http://php.net/manual/en/function.pack.php#104151

PHP is not my expertise :slight_smile: .

Thanks, but I found a solution that works wonders.

http://emrahgunduz.com/categories/development/php/take-2-on-utf8-bom-remove-bom-with-php/

The code is on GitHub.

2 Likes

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.