Jump to content
Larry Ullman's Book Forums

One Mod Rewrite Rule Won'T Work


Recommended Posts

Maybe you know if I can do this.

I want the urticaria rule to be hives. But the hiv-aids rule always catches it. And even it will not let me use hives-treatment.php for the filename.

 

RewriteRule ^urticaria[a-z_\-\./]*$ urticaria-treatment.php

RewriteRule ^hiv|aids[a-z_\-\./]*$ hiv-aids-treatment.php

 

So this rule goes to hiv-aids:

RewriteRule ^urticaria[a-z_\-\./]*$ hives-treatment.php

 

But if it can't be done it's not the end of the world.

And what I really want is this rule to go to hives but even the above rule would be better than that filename urticaria.

RewriteRule ^hives[a-z_\-\./]*$ hives-treatment.php

 

Maybe there is a way to tell the hiv-aids rule to ignore (hive).

 

Also, my code shows the url ending with hiv-aids-treatment.php, so I can't disallow letter 'e'.

Link to comment
Share on other sites

It only partially worked. It did let hives be recognized but now it doesn't recognize hiv at all. Of course, through code I use hiv-aids-treatment which works. But I also want user to be able to type in hiv or aids and get to hiv-aids.

These are the 3 I tried with same results (hiv gives 404 error):

 

RewriteRule ^hiv[^e]|aids[a-z_\-\./]*$ hiv-aids-treatment.php

 

RewriteRule ^hiv[^e][a-z_\-\./]|aids[a-z_\-\./]*$ hiv-aids-treatment.php

RewriteRule ^hiv[^ea-z_\-\./]|aids[a-z_\-\./]*$ hiv-aids-treatment.php

Link to comment
Share on other sites

The more I think about this, the trickier it is in the sense that I don't know exactly what you want.

To clarify, is the following what you want?

- "hiv.php" is okay.

- "hiv" followed by lowercase letters, underscores, hyphens, periods, and forward slashes (except for the letter combo "es" after "hiv") and then ".php" is okay.

- "aids" followed by lowercase letters, underscores, hyphens, periods, and forward slashes and then ".php" is okay.

 

That seems to be what you're asking. It very much is possible, but by doing that I think there are two downsides:

1) Your regex becomes more complex and slower.

2) Adding one-off exceptions like "hiv" not followed by "es" is not a good way to future proof your regex for combinations you have not yet considered.

 

As a result, I might change the regex slightly to reflect something like the following:

"hiv" and "aids" cannot be immediately followed by an English letter or underscore, but they can be followed by a hyphen, period or forward flash, which can then be followed by lowercase letters, underscores, hyphens, periods, and forward slashes.

 

That to me is more practical and simpler, and will better future proof your script.

What do you think?

  • Upvote 1
Link to comment
Share on other sites

I should point out that I only understand small amount of regex. So the reason it is the way it is now is I got it working step by step mostly by looking online and also a previous post I had here. So changes are good providing it end ups working.

 

Here then is what I am doing overall:

 

I have many files (webpages) which I am adding to and probably might be about 50 or even more in the end. I could post the current file if you need it.

 

For SEO and to make the url prettier I am doing the modrewrite. But because I am doing this it also is very convenient to allow user to type in directly just that name and they can get there that way. So that is the secondary goal.

 

All this has been working well until now because of the confict with hives and hiv-aids. I did have previous similar confict with cancer and cancer-symptoms but I solved that by using c-symptoms instead.

 

Generally what I do is my code will issue:

http://adviceofthequeen.com/cancer-treatment

which modrewrite sends to cancer-treatment.php

Then my user can type in:

adviceofthequeen.com/cancer

which modrewrite also sends to cancer-treatment.php

In some cases I let them shorten it such as leukemia they only have to type leuk.

 

So for hiv-aids my code will issue:

http://adviceofthequeen.com/hiv-aids-treatment

which modrewrite sends to hiv-aids-treatment.php

User should be able to type either:

adviceofthequeen.com/hiv

or:

adviceofthequeen.com/aids

and modrewrite sends to hiv-aids-treatment.php

 

I don't expect anyone would enter hiv.php or aids.php. The code wouldn't have to support that.

 

But it seems reasonable to me that if someone types in hiv-aids-treatment.php then that file should be displayed because that file exists.

 

I noticed online I should be able to disallow ^(hive) but it didn't work for me and maybe I didn't know where to place it.

 

Let me know if it is still unclear. And thanks for taking time to try to figure this all out.

Link to comment
Share on other sites

abigail, I applaud you for taking the time to learn how to use regexes.

For me, regexes have been one of the most challenging aspects of computer science I have ever studied. I ended up trying to learn them several times before I finally buckled down and properly learn them.

With such a great challenge though, also comes a great sense of accomplishment and power when you finally learn them, which I'm sure you will gain too with some more practice.

 

Given what you're going for, I would probably simplify your regexes by saying that no English letters can immediately follow the target word/phrase, but after that, there can be English letters. By doing so, no letters immediately after "hiv" will be interpreted as referring to hiv-aids-treatment.php, but at the same time, if there is a hyphen, underscore, period or slash immediately after "hiv", then whatever comes after that will automatically be interpreted as pointing to hiv-aids-treatment.php. By doing this, you won't have anymore issues with "hiv" vs. "hives".

 

As for the ambiguity with the cancer pages, you could write a regex that points http://www.adviceofthequeen.com/cancer to one of the two files, and then point some other URL to the other file, but I'm not sure I'd go that far. To me, while mod_rewrite does make typing URLs easier, for the most part, the benefit is making URLs easier to read. Nowadays, so few people actually type in long URLs (even if the URLs are simple) that such a benefit is hardly there, in my opinion. To that end, if I saw cancer-treatment.php and cancer-symptoms.php, I'd have no problem with that. To me, mod_rewrite isn't even necessary.

 

Anyway, getting back to the actual regex, you could modify the [a-z_\-\./]* part of your regexes to the following to do what I suggested:

[_\-\./][a-z_\-\./]*

 

I should also mention that I'm not trying to tell you how to do something; I just wanted to share my opinion as an avid website user.

Link to comment
Share on other sites

Here is what I have now:

RewriteRule ^hiv[_\-\./]|aids[a-z_\-\./]*$ hiv-aids-treatment.php

And it does everything except if user types hiv it does 404

User must do hiv-

Is there a way to say 'or nothing'?

But I don't understand why aids works but hiv doesn't.

 

When I put the first bracket before the second bracket, it caught hives.

 

What I'm thinking is hiv-aids is a much more sought after condition that hives, so if I can't use hives I will have to use Urticaria.

 

Because of what I wrote below I just thought of an idea: could I use filename urticaria-hives-treatment. It doesn't allow user to type hives but url looks better and better for SEO.

 

Don't worry about trying to tell me what to do. Kicking around ideas is the best way to come up with the best method/solution.

 

As for my tackling regex, I really didn't want to. I knew what a headache it was going to be. But when I realized that the best thing I could do for SEO, and it would be very significant for SEO, is the modrewrite, I went ahead anyway. Larry gives some info in the book which got me started.

 

Here's the thing to understand. Users typing in is secondary. But now that I can provide that it is really nice. For example I can send you an email and say "look at my new page about cancer at adviceofthequeen.com/cancer". I don't have to give them a link even. And I don't have to type in long or complicated filenames. In addition, someone might get to my site that way then instead of using my navigation, says to himself, I wonder if she has treatment for leukemia and he can even just type leuk into browser and get around lots of my website that way.

 

But here's the thing. The real advantage is in SEO canonicalization. My code directs everything to one url: /hiv-aids-treatment, no matter how it got there.

And there is one more detail I forgot to mention. My code also adds #hiv-aids-treatment to the end. And google would think that is a different url. So as I see it, there are only 2 ways to do it. Either only use /hiv-aids-treatment.php (without #), or use /hiv-aids-treatment and everything is canonical to that.

 

The only issue I have, and apparently this can't be solved, is that once I have a rule that catches 'cancer', then I can't use 'cancer' any place else. I can't even use it in the file name. I can't use 'cancer symptoms'. That is why I abbreviated it and for this instance it doesn't matter because no-one will be searching for cancer symptoms treatment. But there are others that collide and I will solve most of them by combining them into a grouping.

 

But here is one example: I am now working on adding Mental Sharpening, which is a major term. Later I will add Mental Retardation. Which I will have to name retardation-treatment instead of mental-retardation-treatment, or even retardation-mental-treatment (unless I can use this because mental is not the first word). It's not very significant for user but for google it is. For example, because of drought we have retarded plant growth, etc. Google can bring up my website for plant retardation treatment.

 

But even all of these collisions are still minor. Most people would find these because they are already at my site or by word of mouth. The major ones is what I need google to see.

Link to comment
Share on other sites

Whoa! That was a lot to say all at once.

I guess the first think you need to do is decide exactly what you want your final URLs to look like, and then go from there.

Also, you mentioned that you can't have "/cancer" point to two separate scripts. That is correct and is just something you'll have to live with.

 

As for regexes, I understand the fact that you are (begrudgingly) learning them because you need to, not because you want to. I was in the same boat as you, but now that I know them, I like them.

Also, I should have better explained the regex in my previous post, as it's not a complete solution by itself. I was kinda hoping you could get the rest from there.

A more complete regex to handle everything you want would be something like the following:

 

^(hiv|aids)$|[_\-\./][a-z_\-\./]*$

 

Note that the "hiv|aids" part needs to be in parentheses, or else it won't work properly.

The second vertical bar allows you to have only "hiv" or "aids" and then end the string there, or have stuff after "hiv" and "aids", in which case, the character immediately following cannot be an English letter.

Basically, I think that's what you want, but again, you'll need to verify that.

 

As a final note (for this post, at least), you can add ?: after the left parenthesis of capturing groups to make them non-capturing, if that's necessary. In terms of regex engine efficiency, I don't know which one is quicker though.

Link to comment
Share on other sites

Well, that does not work. I believe it is syntax error because it doesn't find my style sheet. When I remove one or both of the $| then it can't find /hiv or /aids.

 

Maybe this isn't worthwhile. All this time spent. It's pretty minor, really.

 

Actually, it's not cancer going to 2 scripts that I want.

I wanted /cancer goes to /cancer-treatment

/cancer-symptoms-treatment goes to /cancer-symptoms-treatment

or /cancer-symptoms could go to /cancer-symptoms-treatment

This it will not let me do.

I'm pretty sure it wouldn't let me do this either:

/sympt goes to /cancer-symptoms-treatment

 

But these things are so minor, really. Look what it can do. It's working great for 99% of it, and good for the other 1%.

 

I did another test and it will let me do

/sympt goes to /symptoms-cancer-treatment

 

So this is another option I didn't think of.

I could do /urtic goes to /urticaria-hives-treatment

and at least user will see hives and also google will see both.

Link to comment
Share on other sites

Hmmm... that's weird.

Granted, I only tested out my regex by using the JS test method, but my regex returned true for things like "hiv", "hiv_", "hiv-whatever", etc., and at the same time, it returned false for things like "hive" and "hives".

Hmmm... I wonder if the regex syntax for mod_rewrite is more strict than for JS. I guess the only way to know for sure is to write my own mod_rewrite script.

Sorry for the bum regex. When I have time later, I'll write my own mod_rewrite script and test things out and report back.

Please be patient, and don't give up. I think we're close.

Link to comment
Share on other sites

If you think it's worth doing I won't give up.

Too bad you have to take time to write code.

You could test my code, all except writing the .htaccess file.

You would have to believe that I type in what you tell me to.

Then you can see the results for yourself.

I don't know if that would save you any time.

One thing to consider -- my page did not use the style sheet so I didn't test further. Maybe it would do as you say but there is something wrong anyway.

Link to comment
Share on other sites

Well, I think that we've come this far, so we shouldn't give up just because of a little roadblock. I will try and resolve this issue by the end of the day, so don't worry. Also, I enjoy doing this, so it's okay. Furthermore, I don't need to write that much code to test it, so again, please don't worry.

 

As for the style sheets not loading, that sounds like a bad path and nothing else. One problem at a time though, I suppose.

Link to comment
Share on other sites

I don't really have to rush with this because I have it working, just not ideally working.

I will test the style sheet further tomorrow and maybe you want to wait what I find out before you do yours.

But the only other time I had problem with style sheet was with the final /, which I haven't been testing that for these cases. And then I had extra space or missing space and my website wouldn't load it all and it gave a 500 error.

Otherwise I haven't had any problems with the style sheet while working with the modrewrites.

Link to comment
Share on other sites

Just tested it out, and I found one small problem with my regex. Apparently the Apache regex engine is in fact more strict than the JS one. I needed an extra set of parentheses to make it work. Anyway, here's the final regex I used:

 

^(hiv|aids)($|[_\-\./][a-z_\-\./]*$)

 

And here are the two files I created for testing purposes:

 

.htaccess file

RewriteEngine on
RewriteRule ^(hiv|aids)($|[_\-\./][a-z_\-\./]*$) hiv-aids-treatment.php

 

hiv-aids-treatment.php file

<?php

 echo 'This is hiv-aids-treatment.php.';

?>

 

I then put both of those files in the same directory and tested it in XAMPP, and it seems fine. All the following worked:

hiv

hiv-

hiv_es

hiv.adasdasdasd

 

And the following did not:

hive

hives

hive-asdasdasd

hive_

 

Basically, it worked the way it should.

If you're not getting the same results and/or you're getting an error, then more than likely, there is a problem with the way your environment is set up.

Anyway, please let me know what happens. Thanks.

  • Upvote 1
Link to comment
Share on other sites

It works perfectly!

And thank-you so much for all your work helping me.

I know you like to solve problems but even so.

I probably would not have put in the time to learn it at this level to get such a complicated rule working.

 

Not to waste your time, but this got me thinking last night: What type of person develops something like regex? I mean we are programmers and tend to think a certain way that others do not, unless they are engineers. But most of us are lucky just to be able to figure out how to use regex. What kind of a mind develops it? This is rhetorical question of course.

Link to comment
Share on other sites

The short answer to who invented regular expressions is some very smart people.

A slightly more interesting and detailed answer can be found here:

http://blog.stevenle...s/regex-legends

 

On that list, Ken Thompson and Larry Wall are probably the two biggest contributors to modern regular expressions.

I think the important thing to understand though is that the concept of finding patterns in text using computers is something that's been around since the beginning, and attempts to do just that were the precursors to regexes.

 

There's actually a really famous quote surrounding regexes that is quite amusing. It goes:

Some people, when confronted with a problem, think, “I know, I'll use regular expressions.” Now they have two problems.

 

Anyway, I'm glad you got everything working. Good luck with the rest of your project.

  • Upvote 1
Link to comment
Share on other sites

 Share

×
×
  • Create New...