Htaccess Vocabulary

Correct.

Google will never see doesnotexist.html. That’s the difference between an external redirect (using the [R] flag) and an internal rewrite. The only thing Google’s bot will see is a request for robots.txt and a response with 404 status.

If ’ The only thing Google’s bot will see is a request for robots.txt and a response with 404 status.’ means ‘The only thing Google’s bot will see, when doesnotexistatall.html is used, is a request for robots.txt and a response with 404 status.’ then that’s great. That’s like a new tool, using an address that doesn’t exist when dealing with searchengines who have hallucinations of pages that don’t exist.
Thanks a lot Mott, I’ll carry the news.

Mr. Mott that is.

I made the change and it looks like my webhost prevents the 404 error and directs me to the non https address. I’m not sure what google will think, especially since my webosts adds to the address my main website address, so now https pages are directed to http://xyz.mainsite.com. I’ll try the other code, I bet it does the same thing.

3minutes later…
It does the same thing

When my webhost was trying to get this to work they put in just before the code

# For security reasons, Option followsymlinks cannot be overridden.
#Options +FollowSymLinks 
Options +SymLinksIfOwnerMatch

I have no idea if it’s needed for the code to work or even if it disprupt the code in some way.

My wenhost worked on the problem and said you can direct http to https but you can’t direct https to http. I told them I don’t want to go to an http address but there seems to be no way to incur a 404 error without using an address in the code.

Apparently this is not an uncoomon problem. They referred me to this page

This doesn’t sound right to me. If you’re doing an internal rewrite, then there’s no http<=>https switch going on. It’s just a single https request that returns a 404 response.

Also, I’m not sure the linked thread backs them up. In that thread, people are discussing serving a different robots.txt depending on whether it was requested through http or https, which they accomplished with an internal rewrite, same as your own in this thread. There’s nothing implying that a request for robots.txt couldn’t return 404.

This is the cde they gave me before saying you can’t direct https to http.
RewriteCond %{SERVER_PORT} 443
RewriteRule ^(.*)$ http://xyz.com/404.html$1 [R,L]

I’ll call them again and start over, and see what happens.

Other people seem to have had success redirecting from https to http, so I’m not sure what to tell ya. Something funky is going on with your host.

They said the problem is being on a shared server - other people on the server are using https.

I tried
RewriteCond %{SERVER_PORT} ^443$
RewriteRule oldproduct - [G,NC]

Didn’t help.

I got a manager on the line. He says it’s normal for this to occur because my site is on a shared server. He says I have to rent an IP address and start SSL service in order to give google a 404 error for one of https or http.

What webhost do you prefer? Maybe Bluehost isn’t worth the trouble.

I put in all the different codes this time:
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots\.txt$ robots_ssl\.txt

RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots\.txt$ 404\.html [R=404,L]

RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots\.txt$ nohttpsatall\.html

RewriteCond %{SERVER_PORT} ^443$
RewriteRule oldproduct - [G,NC]

Also, in robots_ssl\.txt I put in

User-agent: Googlebot
Disallow: /

User-agent: *
Disallow: /

instead of just

User-agent: *
Disallow: /

I know it’s pathetic and desperate, good enough to insult any decent apache programmer, but I did it for two reasons, 1) I read Google saying it will read any and all files it reaches, that would include any htaccess file, and 2) one time I was given code to put in my htaccess file that had an shtml address instead of html and google webmaster was asking me where the shtml file was. So maybe its robots were programmed to look for information on https. One wishful sign is that in just the few days I was trying the different codes, the number of https files google thinks exists dropped from 18 to 14.

This makes me wonder… what behavior are you now seeing when you visit https://xyz.com/robots.txt, and is it the behavior you wanted?

Apache’s default configuration is such that htaccess files will never be served, so Google will never reach them.

For https robots page, from Firefox I get “This Connection is Untrusted”. If you choose “I understand the risks” and allow it to ad an exception, it takes you nowhere but leaves you at https://xyz.com/robots.txt. But if I go to my home page and add the s, I get the same untrusted/exception stuff but once through that it takes mt to the non https address and adds the main domain of my webhost account - http://xyz.maindomain.com.

If you can think of some htaccess code that you could put in your htaccess file that contains .shtml (assuming you don’t use shtml in the code for that site), wait about 2 weeks, and you’ll see it in google webmaster saying it got a 404 error for .shtml. Give it a try.

Thanks

C77,

The way that this thread has degenerated (and the poor responses from your host), I will simply reiterate my advice to look at the examples near the end of my tutorial for forcing http or https depending upon the files’ requirements. Since you do NOT have a dedicated IP address (a requirement for your own SSL certificate), there is NO reason for you to use https at all (and you really should force http (port 80) on any https request received by your account. If you have any questions, please PM me directly.

Regards,

DK

I looked at your regex trick. How about this:

RewriteCond %{HTTPS} on [NC]
RewriteRule !^$ http://%{HTTP_HOST}%{REQUEST_URI} [R=401,L]

since I don’t have any https pages and would like to incur a 404. No idea what ! before ^ means.

I found this info,
"The best practice is to install the secure certificate on a dedicated subdomain, such as secure.example.com This also avoids having all your regular urls resolve as https - historically that has caused duplicate url problems in Google "
on this site http://www.webmasterworld.com/google/3411545.htm

C77,

The “regex trick” is overly complicated but your approach seems good - with three comments:

RewriteCond %{HTTPS} on [NC] 
# you don't need the No Case flag; Apache knows it will give the "on" in lowercase letters.
RewriteRule  !^$ http://%{HTTP_HOST}%{REQUEST_URI} [R=401,L] 
# not null? IMHO, it's better to guarantee a match with ".?" which does the same thing but far cleaner.
# Of course, do NOT use the quotes I've shown!
# I would use R=301 to show a PERMANENT redirection (so SE's will update their database)

The ^ is the start anchor metacharacter. It has no width but denotes the beginning of the string which, by definition, is the %{REQUEST_URI} in a RewriteRule.

And you believed that nonsense? I thought you wanted to let people trying to access your website via a secure server know that your content is NOT to be accessed in that manner (it would be through your host’s certificate so it would throw error after error for cert mismatch). You’ve gone about it the right way so PLEASE don’t be distracted with MISinformation.

Regards,

DK

How about this?
RewriteCond %{HTTPS} on
RewriteRule .?http://%{HTTP_HOST}%{REQUEST_URI} [R=401,L]

That code had no effect, still got “untrusted” from firefox, once through the warning still got directed to http with main domain added.

C77,

I don’t know if it’s a typo or something about the forum but there was a space missing after the ? in your RewriteRule. While I would have expected a syntax error like that to throw a 500 error, there may be a configuration error on the server (are you sure mod_rewrite’s enabled?).

Otherwise, I’d STILL recommend a response code of 301 so that SE’s will update their https to http links. After all, when you provide a good link to a script, it should get a 200 code, not a 404. The 301 tells SE’s that the redirection is permanent.

If there is a problem with the server or mod_rewrite’s not enabled, you might add PHP code to your header.php script to ensure that it’s running in a non-secure server. If you need help with that, feel free to PM me. Having said that, though, the first line of defense should be mod_rewrite.

Regards,

DK