Problem with ^/ in mod_rewrite

This mod_rewrite is not behaving as expected…


#RewriteRule ([^/]+)/$ articles/index-section.php?section=$1 [L]

When I go to this URL…


http://local.debbie/finance/economy/

…then why does section=‘economy’

The whole point of saying [^/] was to make Apache grab finance and disregard everything after the /.

Also, is it correct that you generally want to place my GENERIC mod_rewrites before more SPECIFIC ones?

My website has “Sections” and “Subsections”, and so I figured that my mod_rewrites would go in this order…


# SHOW SECTION INDEX
#-------------------------------------------------------------------------------
#PRETTY:		finance/
#UGLY:			articles/index-section.php?section=finance

#Rewrite only if the request is not pointing to a real file.
RewriteCond %{REQUEST_FILENAME} !-f

#Match any kind of Section.  PHP will decide if it's valid or not.
RewriteRule ([^/]+)/$ articles/index-section.php?section=$1 [L]


# SHOW SUBSECTION INDEX
#-------------------------------------------------------------------------------
#PRETTY:	finance/tax-season/
#UGLY:		articles/index-subsection.php?section=finance&subsection=tax-season

#Rewrite only if the request is not pointing to a real file.
RewriteCond %{REQUEST_FILENAME} !-f

#Match any kind of Section and Subsection.  PHP will decide if it's valid or not.
RewriteRule (.+)/(.+)/$ articles/index-subsection.php?section=$1&subsection=$2 [L]


Sincerely,

Debbie

In that case, you probably want a “beginning of string” anchor instead of the “end of string” anchor that you currently have.

RewriteRule [SIZE=4][COLOR=“#FF0000”][1]/SIZE/ articles/index-section.php?section=$1 [L]

Nope. Other way around. You want specific ones first. Because rewrite rules are processed in order, so if the generic ones are applied first, then the specific ones might never get an opportunity to run.


  1. /COLOR ↩︎

Okay, Jeff, since you are the king of mod_rewrites and regexs, here is an advanced question dealing with what I truly want… :wink:

If a URL looks like this…


local.debbie/finance/markets/brazil-seeks-higher-power-auction-rate-to-spur-use-of-coal

…then the ARTICLE mod_rewrite (1st one) should kick in.

If the URL looks like any of these…


local.debbie/finance/markets/by-date/desc/3/
local.debbie/finance/markets/by-date/desc/3
local.debbie/finance/markets/by-date/desc/
local.debbie/finance/markets/by-date/desc
local.debbie/finance/markets/by-date/
local.debbie/finance/markets/by-date
local.debbie/finance/markets/

…then the SUBSECTION mod_rewrite (2nd one) should kick in.

Why all of those combinations?

Because if the URL doesn’t point to a properly formed Article, then I want the SUBSECTION mod_rewrite to catch things, and pass it on to my “index-subsection.php” script which will either…

a.) Display Articles sorted in the way requested

b.) Take the malformed URL (e.g. “local.debbie/finance/markets/by-date”) and redirect to a default URL (e.g. “local.debbie/finance/markets/by-date/desc/1”)

I almost have things working, but cannot figure out the ARTICLE mod_rewrite…


# SHOW ARTICLE

#Rewrite only if the request is not pointing to a real file.
RewriteCond %{REQUEST_FILENAME} !-f

#Match any kind of Section, Subsection and Article.  PHP will decide if it's valid or not.
RewriteRule ^([^/]+)/([^/]+)/(?:(?!by-date).)*$ articles/article.php?section=$1&subsection=$2&article=$3 [L]



# SHOW SUBSECTION INDEX

#Rewrite only if the request is not pointing to a real file.
RewriteCond %{REQUEST_FILENAME} !-f

#Match any Message-View, Sort-Name, Sort-Direction, Page combo.  PHP will decide if they are valid.
RewriteRule ^([^/]+)/([^/]+)/((by-date)/?)?(([^/]+)/?)?(([^/]+)/?)?$ articles/index-subsection.php?section=$1&subsection=$2&sortname=$4&sortdir=$6&page=$8 [L]

Any help would be much appreciated!!

Sincerely,

Debbie

It looks like the article name isn’t getting captured. (?:slight_smile: is a non-capturing group. And I changed the quantifier * to + so that there would need to be at least one character of the article name. You may need to do this:

RewriteRule ^([^/]+)/([^/]+)/COLOR=“#FF0000”[/COLOR]$ articles/article.php?section=$1&subsection=$2&article=$3 [L]

I came up with this one on my own, and behaves exactly as I want…


RewriteRule ^([^/]+)/([^/]+)/(?:(?!(by-date|by-title)))([^/]+)$ articles/article.php?section=$1&subsection=$2&article=$4 [L]

But is the ?: necessary?

Debbie

Nope, as written, the ?: is unnecessary. In fact, the set of parentheses the ?: is associated with is also unnecessary, as is the inner-most set of parentheses around by-date|by-title.

I used ?: because it is a “passive, non-capturing group” and since I don’t need to capture “by-date|by-title” it seemed to be the way to go…

Can you help me understand the difference between my code snippet…


	(?:(?!(by-date|by-title)))([^/]+)

And your code…


((?!by-date).+)

Sincerely,

Debbie

BTW, during testing I see my mod_rewrite isn’t quite right yet.

Here is the problematic snippet…


(?:(?!(by-date|by-title)))([^/]+)

What it should do is this…

IF the 3rd part of the url is NOT either “by-date” OR “by-title” THEN assume the value is an Article slug and go to “article.php” ELSE the value is one of those two values then it is a valid Sort-Name, so drop down to the Subsection mod_rewrite and ultimately go to “index-subsection.php”

If my URL looks like this…


www.debbie.com/finance/economy/postage-meters-can-save-you-money

…then I go to “article.php” which is correct

If my URL looks like this…


www.debbie.com/finance/economy/by-date

…then I go to “index-subsection.php” which is correct

But if my URL looks like this…


www.debbie.com/finance/economy/by-date2

…then I go to “index-subsection.php” which is WRONG!!!

Not sure what is wrong with my regex?! :-/

Sincerely,

Debbie

Basically that’s right. The reason I said it was unecessary is because your passive parentheses are wrapping something that’s already wrapped in parentheses.

[FONT=Courier New]before => COLOR=“#FF0000”/COLOR

after => (?!(by-date|by-title))([^/]+)[/FONT]

Then you also have another set of parentheses – the capturing kind – around by-date|by-title, which is why in your substitution URL, you had to skip $3 and instead use $4.

[FONT=Courier New]before => (?COLOR="#FF0000"[/COLOR])([^/]+)

after => (?!by-date|by-title)([^/]+)[/FONT]

Now there isn’t any more extra parentheses.

Ahh, yes. We’ll have to check that “by-date” is followed by either a slash or the end of the string to make sure there isn’t anything else in the path segment, like a “2”. :slight_smile:

^([^/]+)/([^/]+)/(?COLOR="#FF0000"(?:/|$)[/COLOR])([^/]+)$

It looks like that fixed my problem - THANKS!!

This is some hard-core stuff!! (:

If you don’t mind, let me try and put your code into plain English to see if I understand what is going on…

New mod_rewrite


RewriteRule ^([^/]+)/([^/]+)/(?!(?:by-date|by-title)(?:/|$))([^/]+)$ articles/article.php?section=$1&subsection=$2&article=$3 [L]

mod_rewrite explained


^			Start of Regex

([^/]+)			One or more of anything but a Slash

/			Required Slash

([^/]+)			One or more of anything but a Slash

/			Required Slash

(?!			Forward Negative Assertion??
			Look forward but do not capture
			IF Not Match THEN continue, ELSE fail

(?:by-date|by-title)	Forward Assertion??
			Look forward but do not capture
			IF Match of either THEN continue, ELSE fail

(?:/|$))		Forward Asesrtion??
			Look forward but do not capture
			IF Match or either THEN continue, ELSE fail

([^/]+)			One or more of anything but a Slash

$			End of Regex

mod_rewrite explained in prose…
Look for a 1st variable without a slash in it, then a slash, then a 2nd variable without a slash, then a slash, then IF the 3rd variable is NOT either “by-date” followed by a slash or end of string OR is NOT “by-title” followed by a slash or end of string THEN look for the 3rd variable without a slash, and if all of these conditions are true, then goto “article.php” otherwise drop through to the SUBSECTION mod_rewrite.

Does that sound correct??

Sincerely,

Debbie

A non-capturing group.

By and large yes. Only minor difference is that “by-date” followed by a slash or end of string isn’t the 3rd variable. The by-date|by-title stuff is not enclosed in capturing parentheses, so it isn’t a variable at all. But the “One or more of anything but a Slash” following it is enclosed in capturing parentheses, which makes that the 3rd variable.

What is the difference between a “non-capturing group” and an assertion?

To me they sound like one in the same, and may explain my confusion with all of this…

Debbie

The assertions – positive and negative look-ahead, and positive and negative look-behind – are zero-width patterns. What that means is, they peek at the characters around them, but they don’t advance the match position.

A non-capturing group, on the other hand, matches normally. It’s sole purpose is to provide grouping behavior. For example…

abc* => matches “a” then “b” then 0 or more “c”

FONT=Courier New*[/FONT] => matches 0 or more “abc”

They work almost exactly like regular capturing parentheses. The only difference is that FONT=Courier New[/FONT] will remember for later the contents of that group, and FONT=Courier New[/FONT] won’t.