Dimitri Vorontzov Posted February 14, 2013 Author Share Posted February 14, 2013 Sure, Larry, I understand – and after all, it's your forum, isn't it? You can even ban me from it at a click of a button whenever you wish. But try to look at it in a different perspective. What if someone is searching Google for "regex to validate email" and finds this already popular thread in your forum? They will be attracted by the discussion, and then they may become curious: "who is this guy, Larry Ullman? Oh, interesting, he wrote quite a few books on PHP! And on other languages! Why don't I check them out!" And then that person will buy your books, read them, learn from them, and will become a better web developer and programmer, and at the end there will be one more person in the world liking and respecting you for your teaching and writing abilities. Is this a quixotic pursuit? Most likely, yes. It may even be herculean. But I think it's worth it. On the other hand – learning to forge effective regular expressions – is that a quixotic pursuit? I don't think so, I think it's actually a very reasonable and rational pursuit. Link to comment Share on other sites More sharing options...
Jonathon Posted February 14, 2013 Share Posted February 14, 2013 I liked your comment purely for the word "quixotic". That's a keeper. http://regexlib.com/ 1 Link to comment Share on other sites More sharing options...
Dimitri Vorontzov Posted February 14, 2013 Author Share Posted February 14, 2013 Thanks, Jonathon! I wasn't aware of regexlib.com – it's an excellent resource, and I appreciate your posting it. It's Larry who deserves full credit for "quixotic". Link to comment Share on other sites More sharing options...
Jonathon Posted February 14, 2013 Share Posted February 14, 2013 It's a great resource. Yes I know, that's why I up-voted Larry's post with it in Link to comment Share on other sites More sharing options...
HartleySan Posted February 15, 2013 Share Posted February 15, 2013 Dimitri, I think Larry has already provided you with more than enough links and information to give you what you want. Beyond what he has already given you, what do you want? What is it that you still want to know (and please be specific), because I simply can't understand what else you want/need to know? All the info you need is already available from what Larry has given you (which is an awful lot, I think). 1 Link to comment Share on other sites More sharing options...
Antonio Conte Posted February 15, 2013 Share Posted February 15, 2013 Sometimes, it's not the end of the travel that's important, but rather the travel itself. I get that part completly, Dimitry. That being said, you got to respect other people's time. Larry passing on this thread once a satisfying answer is provided can't really be looked down upon. However, escpecially as a student and hobby programmer, I can appriciate the interest. Sometimes, those interest must be persued on your own. Most people don't have the time, even if they'd probably both want to and find the topic interesting. I'm not really interested in Regexes, and I definitly don't have the knowlege to help you. Just wanted to rant a little, as usual... And give you a little nod. 1 Link to comment Share on other sites More sharing options...
Dimitri Vorontzov Posted February 15, 2013 Author Share Posted February 15, 2013 Antonio, point taken, thank you. Ditto HartleySan, but to answer your question about what else I want to know, it's this: I'm well aware I'm caught in the newbie trap of trying to invent the perfect regex to match an email address. But I don't have the goal to match all possible versions of an email address, I just want to figure out the regex that would work with a majority of normal email addresses. The purpose of that "quixotic pursuit" is not to validate emails in any practical application, but rather to master the PHP flavor of regex, using email validation as example, for the lack of a better one – and I'm stuck with that goal, because the chapter on regex in Larry's book is where that book suddenly became challenging for me. So if Larry wants to bail out, it's fine with me, and it's definitely his right, but I can't. Does this make any sense to you? So back to "what do I still want to know" question: I'm actually satisfied with the part of the most recent version of that regex that comes after the @. I looked far and wide and for the life of me I can't come up with the kind of valid domain name that wouldn't be matched by that part, so I'm okay with it. What comes before @, however, I don't like at all. It sucks. It would validate anything: _______@somewebsite.com, .@somewebsite.com, and so on. As I said, I want it to validate a reasonable majority of normal email addresses. So I want to figure out ways to improve it, and that's why I ask questions about it. And I could, of course, be doing that on some other guy's forum, but since it's Larry's book that I'm studying, it's only logical to ask the questions here. Help from someone knowledgeable would be appreciated, even though I obviously can't insist. Link to comment Share on other sites More sharing options...
HartleySan Posted February 15, 2013 Share Posted February 15, 2013 The first time I read this book, I also got stuck on the regex chapter (and I ended up leaving it for the longest time before I finally went back to it and really learned it). It's probably the hardest single chapter in the book. In addition, I totally understand your interest in pursuing a good regex for educational reasons. That's fine. What I can't understand is the following: I just want to figure out the regex that would work with a majority of normal email addresses. This is a very relative and arbitrary thing to say. To be honest, I get the feeling that you yourself don't know exactly what you mean by the above. That's why I think you should take the time to sit down and critically thinking about what you want while taking notes. To give an example, you might decide something like, "The local part of the address can contain underscores, but it must start with and contain a letter." After you clearly and explicitly define what you want, then you'll have a better chance of trying to get what you want. Please take some time to think carefully about this, do the proper research, and then come back and let us know what you want if you can't get it. At this point though, I think you have all the info you need to get what you want, but more than anything, you need to take the time to really sit down and think about it a lot. Slowly, it'll all start to make sense, and you should be all right after that. To help guide you a bit, I have a feeling that the main thing you want to know that you don't know yet has to do with lookarounds. Using lookarounds, you can write regexes for things like, "The string must contain at least one letter and one number, and they can be in any order." For more info about lookarounds, please see the following: http://www.regular-expressions.info/lookaround.html 1 Link to comment Share on other sites More sharing options...
Dimitri Vorontzov Posted February 15, 2013 Author Share Posted February 15, 2013 Thanks, HartleySan, and you're absolutely right: now I'm attempting to define what would be a set of requirements that would validate reasonably large and most probable percentage of whatever comes before @ in an email address, and I'm right in the middle of research on that. Obviously, I'm not just sitting around waiting for someone to solve my problems: for the last few days I've been doing mostly research on regex. Thanks for the resource on lookarounds, it's indeed very valuable! Link to comment Share on other sites More sharing options...
HartleySan Posted February 21, 2013 Share Posted February 21, 2013 Dmiitri, if you haven't seen it already, I highly recommend checking out the following link: http://diveintohtml5.info/forms.html#validation Dive Into HTML5 is one of my favorite web resources, and the link above really hits the nail on the head. The links in the link above are really great as well. Enjoy the read! Link to comment Share on other sites More sharing options...
ericp Posted December 12, 2013 Share Posted December 12, 2013 I have a question about email matching with regular expressions. Chapter 14, Pg. 445, contains the following email matching pattern: ^[\w.-]+@[\w.-]+\.[A-Za-z]{2,6}$ I may be wrong, but wouldn't it match something like this? somename@some_website.com I would say 'no, it would' because the shortcut \w after @ certainly does allow (and match with) letters, numbers and underscore, according to the PHP manual. So, we cannot remove the underscore if we still want to use \w immediately after the @ sign. if you don't really want the underscore after the @ sign, you may follow the HartleySan's character class, [A-Za-z0-9]. However, this character class will stop such a valid email as somename@some-website.com as it contain a dash (-). So, to my understanding of this topic, if we want to validate/ allow any words, numbers, hyphen, but not underscore between the @ and . in an email address as per the original quest of the 'perfect' one, we can only go with the class like [- A-Za-z0-9] or [A-Za-z0-9 -] (it does no harm if you want to escape the hyphen as per Larry's syntax above) Besides, we can test the other class like [\w.-]+[^\_] between the @ and . Hope this may help! Link to comment Share on other sites More sharing options...
HartleySan Posted December 12, 2013 Share Posted December 12, 2013 Yes, good points, Eric. Thanks. Link to comment Share on other sites More sharing options...
Antonio Conte Posted December 12, 2013 Share Posted December 12, 2013 What is a valid email address? That's a really important question. I'm actually allowed to create the following email addresses on my host: _@juvenorge.com _._@juvenorge.com !@juvenorge.com #@juvenorge.com $@juvenorge.com =@juvenorge.com ?@juvenorge.com ^@juvenorge.com I don't know if these are regarded as valid generally, but that being said, I can create them and probably send and receive emails from them. That at least brings up a few interesting questions for you to answer. Link to comment Share on other sites More sharing options...
HartleySan Posted December 12, 2013 Share Posted December 12, 2013 Yep. As one of the top answers on Stack Overflow says related to validating email addresses, it's actually impossible to do completely with a regex alone. And really, it's not so much the characters used in the email address so much as the fact that the address is real and is hopefully one that the user actually owns. Anyway, I say use filter_var and move on. Link to comment Share on other sites More sharing options...
ericp Posted December 12, 2013 Share Posted December 12, 2013 Yes, they can be regarded as valid ones. In my opinion, a valid email address can be seen as any kind of vertical address (to compare to a physical home address) that someone creates for his contact with other people and that it is also a method of exchanging digital messages one another. So, as long as it meets the universal and programmatic structure of only one @ sign and at least a . (dot) after it, it is considered as a valid one. And a valid email address could never be a dead one, but an alive one. I mean that the valid email address means nothing to computers, but it means something to human being. Therefore, your host company (human being) may think that different people refer to have different kinds of email addresses. So they tolerate the use of non-alphanumeric characters, and ask computers (machine) to accept it. So, as a programmer, I think, when he wants to validate a valid email address, he must: 1/ validate and sanitize the email structure syntactically (there is no more than @ sign or so in the structure). 2/ ask the email users or email servers to confirm the validity and reality of the email (there is at least one person possessing it). Then he is successful....right? 1 Link to comment Share on other sites More sharing options...
Antonio Conte Posted December 12, 2013 Share Posted December 12, 2013 I would agree with that. The second step with general validation is what I usually go for. Link to comment Share on other sites More sharing options...
phpstuff Posted January 10, 2014 Share Posted January 10, 2014 Dimitri, this regular expression allows for all possible valid email address syntaxes while not allowing any invalid email address: (??:\r\n)?[ \t])*(??:(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r\n)?[ \t]))*"(??: \r\n)?[ \t])*)(?:\.(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r\n)?[ \t]))*"(??:\r\n)?[ \t])*))*@(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](??:\r\n)?[ \t])*)(?:\.(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r\n)?[ \t]))*"(??:\r\n) ?[ \t])*)*\<(??:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*)(?:\.(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t] )*))*(?:,@(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])* )(?:\.(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*))*) *?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r\n)?[ \t]))*"(??:\r \n)?[ \t])*)(?:\.(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r\n)?[ \t ]))*"(??:\r\n)?[ \t])*))*@(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ??:\r\n)?[ \t])*)(?:\.(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? ?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?? :\r\n)?[ \t])*))*\>(??:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?? ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r\n)? [ \t]))*"(??:\r\n)?[ \t])*)*?:(?:\r\n)?[ \t])*(??:(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(??:\r\n)?[ \t]))*"(??:\r\n)?[ \t])*)(?:\.(??:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(??:\r\n)?[ \t]))*"(??:\r\n)?[ \t])*))*@(??:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*)(?:\.(??:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ??:\r\n)?[ \t]))*"(??:\r\n)?[ \t])*)*\<(??:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*)(?:\.(??:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](??:\r\n)?[ \t])*))*(?:,@(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](??:\r\n)?[ \t])*)(?:\.(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](??:\r\n)?[ \t])*))*)*?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(??:\r\n)?[ \t]))*"(??:\r\n)?[ \t])*)(?:\.(??:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(??:\r\n)?[ \t]))*"(??:\r\n)?[ \t])*))*@(??:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*)(?:\.(??:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*))*\>(??:\r\n)?[ \t])*)(?:,\s*( ??:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r\n)?[ \t]))*"(??:\r\n)?[ \t])*)(?:\.(? ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r\n)?[ \t]))*"(??:\r\n)?[ \t ])*))*@(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*)(? :\.(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(??:\r\n)?[ \t]))*"(??:\r\n)?[ \t])*)*\<(??:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*)(?:\.(??:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*))*(?:,@(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*)(?:\.(??:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*))*)*?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(??:\r\n)?[ \t]))*"(??:\r\n)?[ \t])*)(?:\.(??: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r\n)?[ \t]))*"(??:\r\n)?[ \t]) *))*@(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*)(?:\ .(??:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r\n)?[ \t])*))*\>(? ?:\r\n)?[ \t])*))*)?;\s*) That does not allow for comments in email addresses, though, which are technically allowed. This is why most developers either use filter_var() or use a minimal regular expression that just catches obvious fakes. LOL...sorry, I almost felt something like that coming, and then it came, better than I expected.. Link to comment Share on other sites More sharing options...
Recommended Posts