abigail Posted July 28, 2012 Share Posted July 28, 2012 Maybe you know if I can do this. I want the urticaria rule to be hives. But the hiv-aids rule always catches it. And even it will not let me use hives-treatment.php for the filename. RewriteRule ^urticaria[a-z_\-\./]*$ urticaria-treatment.php RewriteRule ^hiv|aids[a-z_\-\./]*$ hiv-aids-treatment.php So this rule goes to hiv-aids: RewriteRule ^urticaria[a-z_\-\./]*$ hives-treatment.php But if it can't be done it's not the end of the world. And what I really want is this rule to go to hives but even the above rule would be better than that filename urticaria. RewriteRule ^hives[a-z_\-\./]*$ hives-treatment.php Maybe there is a way to tell the hiv-aids rule to ignore (hive). Also, my code shows the url ending with hiv-aids-treatment.php, so I can't disallow letter 'e'. Link to comment Share on other sites More sharing options...
HartleySan Posted July 28, 2012 Share Posted July 28, 2012 If "hiv" is never followed by an e, you could change the one regex as follows: ^hiv[^e]|aids[a-z_\-\./]*$ 4 Link to comment Share on other sites More sharing options...
abigail Posted July 28, 2012 Author Share Posted July 28, 2012 It only partially worked. It did let hives be recognized but now it doesn't recognize hiv at all. Of course, through code I use hiv-aids-treatment which works. But I also want user to be able to type in hiv or aids and get to hiv-aids. These are the 3 I tried with same results (hiv gives 404 error): RewriteRule ^hiv[^e]|aids[a-z_\-\./]*$ hiv-aids-treatment.php RewriteRule ^hiv[^e][a-z_\-\./]|aids[a-z_\-\./]*$ hiv-aids-treatment.php RewriteRule ^hiv[^ea-z_\-\./]|aids[a-z_\-\./]*$ hiv-aids-treatment.php Link to comment Share on other sites More sharing options...
HartleySan Posted July 29, 2012 Share Posted July 29, 2012 The more I think about this, the trickier it is in the sense that I don't know exactly what you want. To clarify, is the following what you want? - "hiv.php" is okay. - "hiv" followed by lowercase letters, underscores, hyphens, periods, and forward slashes (except for the letter combo "es" after "hiv") and then ".php" is okay. - "aids" followed by lowercase letters, underscores, hyphens, periods, and forward slashes and then ".php" is okay. That seems to be what you're asking. It very much is possible, but by doing that I think there are two downsides: 1) Your regex becomes more complex and slower. 2) Adding one-off exceptions like "hiv" not followed by "es" is not a good way to future proof your regex for combinations you have not yet considered. As a result, I might change the regex slightly to reflect something like the following: "hiv" and "aids" cannot be immediately followed by an English letter or underscore, but they can be followed by a hyphen, period or forward flash, which can then be followed by lowercase letters, underscores, hyphens, periods, and forward slashes. That to me is more practical and simpler, and will better future proof your script. What do you think? 1 Link to comment Share on other sites More sharing options...
abigail Posted July 29, 2012 Author Share Posted July 29, 2012 I should point out that I only understand small amount of regex. So the reason it is the way it is now is I got it working step by step mostly by looking online and also a previous post I had here. So changes are good providing it end ups working. Here then is what I am doing overall: I have many files (webpages) which I am adding to and probably might be about 50 or even more in the end. I could post the current file if you need it. For SEO and to make the url prettier I am doing the modrewrite. But because I am doing this it also is very convenient to allow user to type in directly just that name and they can get there that way. So that is the secondary goal. All this has been working well until now because of the confict with hives and hiv-aids. I did have previous similar confict with cancer and cancer-symptoms but I solved that by using c-symptoms instead. Generally what I do is my code will issue: http://adviceofthequeen.com/cancer-treatment which modrewrite sends to cancer-treatment.php Then my user can type in: adviceofthequeen.com/cancer which modrewrite also sends to cancer-treatment.php In some cases I let them shorten it such as leukemia they only have to type leuk. So for hiv-aids my code will issue: http://adviceofthequeen.com/hiv-aids-treatment which modrewrite sends to hiv-aids-treatment.php User should be able to type either: adviceofthequeen.com/hiv or: adviceofthequeen.com/aids and modrewrite sends to hiv-aids-treatment.php I don't expect anyone would enter hiv.php or aids.php. The code wouldn't have to support that. But it seems reasonable to me that if someone types in hiv-aids-treatment.php then that file should be displayed because that file exists. I noticed online I should be able to disallow ^(hive) but it didn't work for me and maybe I didn't know where to place it. Let me know if it is still unclear. And thanks for taking time to try to figure this all out. Link to comment Share on other sites More sharing options...
HartleySan Posted July 30, 2012 Share Posted July 30, 2012 abigail, I applaud you for taking the time to learn how to use regexes. For me, regexes have been one of the most challenging aspects of computer science I have ever studied. I ended up trying to learn them several times before I finally buckled down and properly learn them. With such a great challenge though, also comes a great sense of accomplishment and power when you finally learn them, which I'm sure you will gain too with some more practice. Given what you're going for, I would probably simplify your regexes by saying that no English letters can immediately follow the target word/phrase, but after that, there can be English letters. By doing so, no letters immediately after "hiv" will be interpreted as referring to hiv-aids-treatment.php, but at the same time, if there is a hyphen, underscore, period or slash immediately after "hiv", then whatever comes after that will automatically be interpreted as pointing to hiv-aids-treatment.php. By doing this, you won't have anymore issues with "hiv" vs. "hives". As for the ambiguity with the cancer pages, you could write a regex that points http://www.adviceofthequeen.com/cancer to one of the two files, and then point some other URL to the other file, but I'm not sure I'd go that far. To me, while mod_rewrite does make typing URLs easier, for the most part, the benefit is making URLs easier to read. Nowadays, so few people actually type in long URLs (even if the URLs are simple) that such a benefit is hardly there, in my opinion. To that end, if I saw cancer-treatment.php and cancer-symptoms.php, I'd have no problem with that. To me, mod_rewrite isn't even necessary. Anyway, getting back to the actual regex, you could modify the [a-z_\-\./]* part of your regexes to the following to do what I suggested: [_\-\./][a-z_\-\./]* I should also mention that I'm not trying to tell you how to do something; I just wanted to share my opinion as an avid website user. Link to comment Share on other sites More sharing options...
abigail Posted July 30, 2012 Author Share Posted July 30, 2012 Here is what I have now: RewriteRule ^hiv[_\-\./]|aids[a-z_\-\./]*$ hiv-aids-treatment.php And it does everything except if user types hiv it does 404 User must do hiv- Is there a way to say 'or nothing'? But I don't understand why aids works but hiv doesn't. When I put the first bracket before the second bracket, it caught hives. What I'm thinking is hiv-aids is a much more sought after condition that hives, so if I can't use hives I will have to use Urticaria. Because of what I wrote below I just thought of an idea: could I use filename urticaria-hives-treatment. It doesn't allow user to type hives but url looks better and better for SEO. Don't worry about trying to tell me what to do. Kicking around ideas is the best way to come up with the best method/solution. As for my tackling regex, I really didn't want to. I knew what a headache it was going to be. But when I realized that the best thing I could do for SEO, and it would be very significant for SEO, is the modrewrite, I went ahead anyway. Larry gives some info in the book which got me started. Here's the thing to understand. Users typing in is secondary. But now that I can provide that it is really nice. For example I can send you an email and say "look at my new page about cancer at adviceofthequeen.com/cancer". I don't have to give them a link even. And I don't have to type in long or complicated filenames. In addition, someone might get to my site that way then instead of using my navigation, says to himself, I wonder if she has treatment for leukemia and he can even just type leuk into browser and get around lots of my website that way. But here's the thing. The real advantage is in SEO canonicalization. My code directs everything to one url: /hiv-aids-treatment, no matter how it got there. And there is one more detail I forgot to mention. My code also adds #hiv-aids-treatment to the end. And google would think that is a different url. So as I see it, there are only 2 ways to do it. Either only use /hiv-aids-treatment.php (without #), or use /hiv-aids-treatment and everything is canonical to that. The only issue I have, and apparently this can't be solved, is that once I have a rule that catches 'cancer', then I can't use 'cancer' any place else. I can't even use it in the file name. I can't use 'cancer symptoms'. That is why I abbreviated it and for this instance it doesn't matter because no-one will be searching for cancer symptoms treatment. But there are others that collide and I will solve most of them by combining them into a grouping. But here is one example: I am now working on adding Mental Sharpening, which is a major term. Later I will add Mental Retardation. Which I will have to name retardation-treatment instead of mental-retardation-treatment, or even retardation-mental-treatment (unless I can use this because mental is not the first word). It's not very significant for user but for google it is. For example, because of drought we have retarded plant growth, etc. Google can bring up my website for plant retardation treatment. But even all of these collisions are still minor. Most people would find these because they are already at my site or by word of mouth. The major ones is what I need google to see. Link to comment Share on other sites More sharing options...
HartleySan Posted July 30, 2012 Share Posted July 30, 2012 Whoa! That was a lot to say all at once. I guess the first think you need to do is decide exactly what you want your final URLs to look like, and then go from there. Also, you mentioned that you can't have "/cancer" point to two separate scripts. That is correct and is just something you'll have to live with. As for regexes, I understand the fact that you are (begrudgingly) learning them because you need to, not because you want to. I was in the same boat as you, but now that I know them, I like them. Also, I should have better explained the regex in my previous post, as it's not a complete solution by itself. I was kinda hoping you could get the rest from there. A more complete regex to handle everything you want would be something like the following: ^(hiv|aids)$|[_\-\./][a-z_\-\./]*$ Note that the "hiv|aids" part needs to be in parentheses, or else it won't work properly. The second vertical bar allows you to have only "hiv" or "aids" and then end the string there, or have stuff after "hiv" and "aids", in which case, the character immediately following cannot be an English letter. Basically, I think that's what you want, but again, you'll need to verify that. As a final note (for this post, at least), you can add ?: after the left parenthesis of capturing groups to make them non-capturing, if that's necessary. In terms of regex engine efficiency, I don't know which one is quicker though. Link to comment Share on other sites More sharing options...
abigail Posted July 30, 2012 Author Share Posted July 30, 2012 Well, that does not work. I believe it is syntax error because it doesn't find my style sheet. When I remove one or both of the $| then it can't find /hiv or /aids. Maybe this isn't worthwhile. All this time spent. It's pretty minor, really. Actually, it's not cancer going to 2 scripts that I want. I wanted /cancer goes to /cancer-treatment /cancer-symptoms-treatment goes to /cancer-symptoms-treatment or /cancer-symptoms could go to /cancer-symptoms-treatment This it will not let me do. I'm pretty sure it wouldn't let me do this either: /sympt goes to /cancer-symptoms-treatment But these things are so minor, really. Look what it can do. It's working great for 99% of it, and good for the other 1%. I did another test and it will let me do /sympt goes to /symptoms-cancer-treatment So this is another option I didn't think of. I could do /urtic goes to /urticaria-hives-treatment and at least user will see hives and also google will see both. Link to comment Share on other sites More sharing options...
HartleySan Posted July 30, 2012 Share Posted July 30, 2012 Hmmm... that's weird. Granted, I only tested out my regex by using the JS test method, but my regex returned true for things like "hiv", "hiv_", "hiv-whatever", etc., and at the same time, it returned false for things like "hive" and "hives". Hmmm... I wonder if the regex syntax for mod_rewrite is more strict than for JS. I guess the only way to know for sure is to write my own mod_rewrite script. Sorry for the bum regex. When I have time later, I'll write my own mod_rewrite script and test things out and report back. Please be patient, and don't give up. I think we're close. Link to comment Share on other sites More sharing options...
abigail Posted July 31, 2012 Author Share Posted July 31, 2012 If you think it's worth doing I won't give up. Too bad you have to take time to write code. You could test my code, all except writing the .htaccess file. You would have to believe that I type in what you tell me to. Then you can see the results for yourself. I don't know if that would save you any time. One thing to consider -- my page did not use the style sheet so I didn't test further. Maybe it would do as you say but there is something wrong anyway. Link to comment Share on other sites More sharing options...
HartleySan Posted July 31, 2012 Share Posted July 31, 2012 Well, I think that we've come this far, so we shouldn't give up just because of a little roadblock. I will try and resolve this issue by the end of the day, so don't worry. Also, I enjoy doing this, so it's okay. Furthermore, I don't need to write that much code to test it, so again, please don't worry. As for the style sheets not loading, that sounds like a bad path and nothing else. One problem at a time though, I suppose. Link to comment Share on other sites More sharing options...
abigail Posted July 31, 2012 Author Share Posted July 31, 2012 I don't really have to rush with this because I have it working, just not ideally working. I will test the style sheet further tomorrow and maybe you want to wait what I find out before you do yours. But the only other time I had problem with style sheet was with the final /, which I haven't been testing that for these cases. And then I had extra space or missing space and my website wouldn't load it all and it gave a 500 error. Otherwise I haven't had any problems with the style sheet while working with the modrewrites. Link to comment Share on other sites More sharing options...
HartleySan Posted July 31, 2012 Share Posted July 31, 2012 The style sheet should not be affected by the mod_rewrite directives. The style sheet is directly related to the actual HTML being used to load it, and that's it. If it's not working, then that's an issue with an incorrect path. Link to comment Share on other sites More sharing options...
HartleySan Posted July 31, 2012 Share Posted July 31, 2012 Just tested it out, and I found one small problem with my regex. Apparently the Apache regex engine is in fact more strict than the JS one. I needed an extra set of parentheses to make it work. Anyway, here's the final regex I used: ^(hiv|aids)($|[_\-\./][a-z_\-\./]*$) And here are the two files I created for testing purposes: .htaccess file RewriteEngine on RewriteRule ^(hiv|aids)($|[_\-\./][a-z_\-\./]*$) hiv-aids-treatment.php hiv-aids-treatment.php file <?php echo 'This is hiv-aids-treatment.php.'; ?> I then put both of those files in the same directory and tested it in XAMPP, and it seems fine. All the following worked: hiv hiv- hiv_es hiv.adasdasdasd And the following did not: hive hives hive-asdasdasd hive_ Basically, it worked the way it should. If you're not getting the same results and/or you're getting an error, then more than likely, there is a problem with the way your environment is set up. Anyway, please let me know what happens. Thanks. 1 Link to comment Share on other sites More sharing options...
abigail Posted July 31, 2012 Author Share Posted July 31, 2012 It works perfectly! And thank-you so much for all your work helping me. I know you like to solve problems but even so. I probably would not have put in the time to learn it at this level to get such a complicated rule working. Not to waste your time, but this got me thinking last night: What type of person develops something like regex? I mean we are programmers and tend to think a certain way that others do not, unless they are engineers. But most of us are lucky just to be able to figure out how to use regex. What kind of a mind develops it? This is rhetorical question of course. Link to comment Share on other sites More sharing options...
HartleySan Posted July 31, 2012 Share Posted July 31, 2012 The short answer to who invented regular expressions is some very smart people. A slightly more interesting and detailed answer can be found here: http://blog.stevenle...s/regex-legends On that list, Ken Thompson and Larry Wall are probably the two biggest contributors to modern regular expressions. I think the important thing to understand though is that the concept of finding patterns in text using computers is something that's been around since the beginning, and attempts to do just that were the precursors to regexes. There's actually a really famous quote surrounding regexes that is quite amusing. It goes: Some people, when confronted with a problem, think, “I know, I'll use regular expressions.” Now they have two problems. Anyway, I'm glad you got everything working. Good luck with the rest of your project. 1 Link to comment Share on other sites More sharing options...
Recommended Posts