Urls without extensions

[COLOR=#444444][FONT=Helvetica]Greeting all, and Happy Valentines day.

I’ve been through quite a number of places on the interwebs this morning searching for the htaccess rules, or set-up, that will allow for stipping of the .php or .html from the url displayed in the address bar in a browser.

I’ve struck out so far.

Has anyone a suggestion or definitive set of .htaccess rules for invoking this? Even the ones from an article here from years ago failed me.

The domain will be php based, static files, and have but 10 to 15 pages. So, all I’m after really is having/forcing something like

xxxxxxxx.com/all-about-company.php

to be displayed as

xxxxxxxx.com/all-about-company/

in the address bar when loaded.

I’m on a linux server.

Thanks in advance for any crumbs.

Michael[/FONT][/COLOR]

First off, you are approaching it backwards.

Think of it like so, you want to go to /all-about-company/ and have it serve up all-about-company.php

For example:

RewriteEngine on
# if a directory or a file exists, use it directly
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

# take the url and append .php to it
RewriteRule ^([a-z0-9-]+)$ $1.php

Thanks cpradio for the quick reply. And I see we’re ‘neighbors’, as I’m punching keys here from Miamisburg. :wink:

Will this addition to the htaccess file then ‘strip’ the extension from the page loaded?

And my naivety is the result of being hand-held in these matters from use of WP for a number of years, and that I didn’t really worry about it the development of smaller sites.

I’d like to add something real quick, because I’m only now discovering that it’s been a recurring topic on these forums for a while.

For rewrites such as this, it’s actually standard – and recommended in the Apache documentation – to match on FONT=Courier New[/FONT] rather than FONT=Courier New[/FONT]. The former is simpler, is more accurate (match any URL), and has no drawbacks.

I smell the grok that may be coming, but still a bit wobbly. The snippet above does allow the following to resolve correctly in the browser address bar:

<!doctype html>
<html>
<head>
<meta charset="UTF-8">
<title>Hello world</title>
</head>


<body>
    <h2>Hello world</h2>

<p>These are two links to check the stripping of the url's extension</p>

<ul>
    <li><a href="test">Test page</a></li>
    <li><a href="test2">Test page 2</a></li>
</ul>
</body>
</html>

Whereas, I have two files named “test.php” and “test2.php”

Is that the magic for which I seek, simply stripping the extension off the ‘link’ and let the snippet resolve the files with .php and then load them, but do so with the link url, without the extension?

If so, I can see the validity of cpradio’s comment that I was looking at it ‘backwards’.

Is the other method un-doable though, just curious.
That is given

<li><a href="test.php">Test page</a></li>

is there not a way in which that link is loaded, but with the .php stripped in the address bar? Again, just curious.

Thanks again, to both cpradio and Jeff

Michael

That is indeed the magic. :slight_smile:

Sort of, yes. You can have Apache send a redirect response. That will cause the browser to re-fetch the page at the new URL, thereby changing the URL in the address bar. But there’s a problem in this situation. Using both this redirect rule along with the earlier rewrite rule is likely to create an infinite loop. A .php URL would redirect to a bare URL, then the bare URL is rewritten to a .php URL, but .php URLs redirect to the bare URL, and so on. I suspect there’s some trick to get around that, but until someone here can figure out what that trick is, you may have to skip this feature for now.

XSS? no drawbacks? I hardly think so :smiley: Try doing an XSS attack with my implementation :slight_smile: Granted this one isn’t appending the match to a URL parameter, so the deed is less likely, but I digress, better to give examples that if used differently won’t open up such attacks, than provide ones that could permit such attacks.

[ot]My Rant
Sorry, but this irks me, please note that ALL uses of (.*) shown at http://httpd.apache.org/docs/2.2/rewrite/intro.html (assuming this is the reference you are referring to) is for PATHS and FILENAMES, definitely never used with VARIABLES.

Granted, we are talking file names here so I will concede you can use it, however, I still never recommend it. As you are allowing more than what may be intended. It may be intended that you only consider the root directory, using (.*) permits ALL directories, that could very well be a security issue if you have admin pages, or files in folders you did not want to be a part of this rewriterule.[/ot]

Back to the topic at hand, Jeff is correct you will end up in an infinite loop if you try to redirect *.php to its non-extension form, and you have a rule that internally redirects the non-extension form to .php. You will want to program the links to all of your pages to not use the extension so people browsing will never see the extension.

[ot]

Can you give an example how (.*) would permit XSS?

Indeed. If it’s not your intention to match any URL, then that’s a case where you shouldn’t match on any character. BUT if you do want to match any URL, then matching any character is exactly what you need.[/ot]

All’s well.

Thanks again for the assistance.

To cpradio for the snippet, and to pointing out the control of ‘not thinking backwards’

And to Jeff for clarification and explanation.

the toy site is here for now: http://ourperfectnight.com/testing-rewrite/

Cheers from Southwest Ohio,

Michael

RewriteEngine on
# if a directory or a file exists, use it directly
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

# take the url and append .php to it
RewriteRule ^(.*)$ index.php?name=$1

Sample URL (if field is not properly protected, it will output a script tag that loads an external JavaScript file):

mydomain.com/%3Cscript+type%3D%22text%2Fjavascript%22+src%3D%22myotherdomain.com%2Fmyscript.js%22%3E%3C%2Fscript%3E

Glad it worked, it is a common mistake that everyone makes when getting started with RewriteRules. We all tend to think it takes .php and redirects to //, not sure why that is, but I know I made that mistake early on too :smiley:

Glad it worked, it is a common mistake that everyone makes when getting started with RewriteRules. We all tend to think it takes .php and redirects to //, not sure why that is, but I know I made that mistake early on too :smiley:

The part I bolded above is crucial. Your scenario assumes that we failed to use htmlspecialchars, and that is what would really allow XSS. The URL is always going to be provided by the user, and therefore tainted, whether we rewrite it or not.

I apologize for the tangent in the conversation. I’m still toying with ways to solve your second request, to remove .php from the address bar. I don’t have a solution yet, but here’s a quick preview of what I’m toying with.

    # If the request is not a subrequest (that is, not a rewritten URL)
    <If "%{IS_SUBREQ} == 'false'">

        # Then redirect to the bare URL
        RedirectMatch ^(.*)\\.php$ $1

    </If>

Yes it is, but I’d still argue it is better to assume the worse and provide something similar that matches plenty but excludes <, >, etc to harden their code than to provide a match everything and allow it to have another path of entry to a vulnerability.

You could always use [a-z0-9/\s-]+ and that will in most cases be everything you need to support from a file name stand point. Why capture < and >?

I’d argue that you’re trying to plug a hole… but missing the hole. .* does not allow XSS, nor does avoiding .* prevent XSS.

Yes, I can agree with that, but it still teaches a good paradigm, validating your input or only taking what you need. It is one more layer of validation, stripping out what isn’t necessary (granted, it only strips out the invalid if it runs your rewriterule, so I concede that argument). However, I can’t think of any valid attempts of naming a file with < or > in it, so I still think it stands to reason to eliminate those characters. I don’t know, maybe it was education that taught me to only capture what you need, not everything just to parse it out.

I’ve yet to run into a situation where I needed to capture everything. I’ve always known at least one restrictions or character I could eliminate. Just my opinion, @cajebo ;, you are more than welcome to capture everything, I just strongly recommend against it unless you know for 100% fact you are not opening yourself to an XSS attack of some sort (and if you run any third party software, you can’t make that guarantee). Granted you could still have one, but at least you could argue security through obscurity (no one knows where your data is being redirected to – but that isn’t a very fair answer either).

I really wanted to leave the discussion as it was, but this kept nagging at me. It’s worth reiterating that if you’re vulnerable to an XSS attack, and you avoid using ., then you’re still vulnerable to an XSS attack. . is neither the cause, nor is avoiding it the cure.

Folks,

IMHO, Jeff Mott is unaware of the security issues as well as poor logic (newbies end up with loopy code for lack of understanding exactly what (.*) does and how it does it).

[rant #1][indent]The use of “lazy regex,” specifically the :kaioken: EVERYTHING :kaioken: atom, (.*), and its close relatives, is the NUMBER ONE coding error of newbies BECAUSE it is “greedy.” Unless you provide an “exit” from your redirection, you will ALWAYS end up in a loop![/indent][/rant #1]

Arguing is not going to resolve the significant difference of opinion and only provides a platform for spreading confusion. Please allow Jeff his own opinion and continue to create mod_rewrite code correctly, i.e., with the best specification possible in your regex because it will avoid the silly problems which will get dumped on your receiving scripts.

I’ve said my peace and will continue my rants against the inappropriate use of (.) but will not engage in silly discussions of why everyone should use it indiscriminately. I grant that there are those times when (.) is the best regex to use but only if you know what you’re doing with it and, as evidenced above, that is not always the case.

'Nuf said.

Regards,

DK