Jump to content
Larry Ullman's Book Forums

Recommended Posts

"You don't have to approve, as long as you validate." It's word I've been living be for a couple of year. (not perfectly translated) It's more important to ensure of security and usability than it is of validity.

 

You can't really prevent users of typing in bad chars, so as long as it does not impose a treat to security, leave it be.

Link to comment
Share on other sites

Define "perfection". I can add juvenorge.no as a domain name, and you have no way of knowing that is real. When you are pretty sure the email is valid, and when it does not break functionality or security, leave it at that. Getting is "perfect" is really not an option unless you want to study regexes.

 

I know the feeling, but trust me: leave it at "good enough for the purpose."

Link to comment
Share on other sites

You have a point here, Antonio.

 

Still: perfection, in regex, I guess, is when it does exactly what it means to. If regex is written to check for a valid email address, it's supposed to check for a valid email address, no more, no less. This one doesn't do the job. I want the one that does. 

 

Yes, a user can submit something that is not an actual email address. But I don't want that user to submit something that is not a valid email address. There's a difference. When things don't do what they are supposed to be doing, next thing that usually happens, the entire civilization crumbles. We mustn't let that happen. 

 

Curiosity won't let me sleep this night: how can we remove the damn _ from \w?

Link to comment
Share on other sites

Dimitri, the best solution is the one Edward proposed: use the PHP filter_var function (http://php.net/manual/en/function.filter-var.php).

However, if you're insistent on writing your own regex and you want the shorthand character class \w without an underscore, then the following character class will achieve exactly that:

[A-Za-z0-9]

 

As the following page notes, \w is the shorthand way of writing the character class [A-Za-z0-9_], so by simply removing the underscore from the "longhand" character class, you will get what you want:

http://www.regular-expressions.info/charclass.html

 

That answer your question?

 

As a corollary, I understand your desire to have only valid email addresses as opposed to email addresses that won't cause any harm to your site, but at the end of the day, as Antonio stated, valid or not, people can still easily put in nonexistent email addresses, and there's nothing you can do about it (within reason). To that end, as Edward already stated (and I am reiterating), please just use the filter_var function and be done with it.

Link to comment
Share on other sites

Thanks, HartleySan!

 

Again, I'm fully with you and Antonio and Edward, as far as using filter_var, and not having to be perfect in most practical situations.

 

But purely theoretically – this is a forum, dedicated to honing one's programming skills, and we can allow ourselves to play a little, can't we?  

 

So, my question to you, HartleySan. The solution that you proposed: [A-Za-z0-9] - am I correct in thinking that it would eliminate the following perfectly valid email address as invalid? 

 

somename@some-website.com

 

And if yes, as it seems to be, than what would be a better solution? 

Link to comment
Share on other sites

Dimitri, this regular expression allows for all possible valid email address syntaxes while not allowing any invalid email address:

 

 

 

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
 \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
 \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
 \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*) 

 

That does not allow for comments in email addresses, though, which are technically allowed. This is why most developers either use filter_var() or use a minimal regular expression that just catches obvious fakes.
Link to comment
Share on other sites

Hm... I think I can see how this may work, but something seems missing here. Aren't you supposed to have * instead of + after  [000-\031]?

 

Seriously though, Larry, I appreciate your humorous (even if slightly extreme) input very much, but... let's think calmly. The chapter on regular expressions in your book is teaching how to use regular expressions, not filter_var() function.

 

My question is about regular expressions, not filter_var() function. I think it's fair of me to treat this forum with respect as a trustworthy source of information, ask questions about things that interest me, related to the book I'm trying to learn from, and expect straight answer from knowledgeable people. 

 

I'm sure it doesn't have to be so hard. I'm not asking for a good alternative for a regex from that chapter. I'm asking about a minor problem, related to regex specifically, a solution to which I'm sure is clear to you, experienced coders, but is not clear to me. 

 

I am NOT asking a question about how to validate emails, in general.

 

I'm asking this: 

 

"How to create a regex pattern that would cover things that happen between @ and . in an email address?"

 

Or, to rephrase it: "what is the proper regex pattern that includes: letters, numbers, period sign, and a dash"? 

 

You guys know your regex, I'm new to it. Help wold be much appreciated. 

Link to comment
Share on other sites

Dimitri, you seemed to have had a rather emotional reaction to my response and I have no idea why. I was neither being humorous nor extreme. Also, I'm thinking calmly. That is the regular expression for validating email addresses completely. There is no cause for you to get defensive and chippy. Obviously it was not clear to myself and everyone else that you're just asking "what is the proper regex pattern that includes: letters, numbers, period sign, and a dash" and not "how to validate an email address". You don't need to berate me (or the others) for not being helpful enough. Simply clarifying your question will suffice. 

 

So the question is "what is the proper regex pattern that includes: letters, numbers, period sign, and a dash" and the answer is [A-Za-z0-9\.\-]. That is: letters (upper and lowercase), the numbers, the period, and a dash.  

 

If this leads to a new question, then ask it plainly and clearly, without the exposition on your expectations of the forum or of us.

Link to comment
Share on other sites

Moreover, Dimitri, look at what you said earlier:

 

"I'm in pursuit of perfection with this one" and "Yes, a user can submit something that is not an actual email address. But I don't want that user to submit something that is not a valid email address. There's a difference. When things don't do what they are supposed to be doing, next thing that usually happens, the entire civilization crumbles. We mustn't let that happen. "

 

So my previous answer, with the most technically correct regular expression, was exactly responding to your stated desire. I don't see why you would turn around and chastise me for giving you the perfect regular expression for validating an email address, per your stated request. Not only did I provide you the absolutely correct regex (which you also could have found by searching online), but I gave you some context as to why trying to define a "perfect" email validator is impractical. 

 

All things considered, I find your reaction to my post to be inappropriate and unjustified.

Link to comment
Share on other sites

Not berating at all: I respect everyone on this forum, and personally you, Larry. I just thought my question was clear from the beginning, namely:

 


 

 

 

I have a question about email matching with regular expressions. 

 

Chapter 14, Pg. 445, contains the following email matching pattern:

 

^[\w.-]+@[\w.-]+\.[A-Za-z]{2,6}$

 

I may be wrong, but wouldn't it match something like this?

 

somename@some_website.com

 

 

What would be the best way to improve it? 

 

So I was a little taken aback by what I perceived as all the guys ignoring or misinterpreting the actual question, and persistently offering a workaround instead. I should have blamed myself instead and rephrased the question right away, but I truly thought what I was asking about was clear from the start. 

 

I also honestly thought that you were joking, when you posted the huge chunk of code above. I thought you implied the absurdity of my "quest for perfection" by posting an absurdly long (and not real) regex. As I mentioned before, I'm new to regex. 

 

One way or another, your follow-up answer is exactly what I was looking for, Larry, and I thank you very much for it. Also you're right, It does lead to the new question, and not one, but two. And here they are:

 

First Question: 

 

The initial regex, as offered in the book, is:

 

^[\w.-]+@[\w.-]+\.[A-Za-z]{2,6}$

 

My problem with it was that it would consider some_website.com as valid.

 

The modified version, which would not validate some_website.com, looks like this:

 

^[\w.-]+@[A-Za-z0-9\.\-]+\.[A-Za-z]{2,6}$

 

This seems to cover pretty much everything, namely:

 

At least one of upper- and/or lower-case letters, and/or periods, underscores and/or dashes followed by one and only one followed by at least one of upper- and/or lower-case letters, and/or periods, and/or underscores followed by one and only one period followed by anywhere from two to six upper- and/or lower-case letters.

 

I don't see what else could be in an email address, so the second regex above seems perfect. 

 

And yet, Larry, since you're saying that the huge regex you posted is the actual email validation code, I'm baffled again. What are the vulnerabilities of the second regex above that would make it not work? 

 

Second Question:

 

Is escaping the period sign and the dash inside the character class in PHP version of regex necessary? 

 

Or, to rephrase that: can it be [A-Za-z0-9.-] instead of [A-Za-z0-9\.\-]  ?

 

 

With greatest respect and admiration, would be grateful for further discussion. 

Link to comment
Share on other sites

Your regular expression allows for invalid email addresses, too, and does not allow for some valid ones. It's far from perfect. And that's the problem with regular expressions and validating emails in particular: you *think* it's perfect, when it's not.

 

Knowing why your regular expression is flawed (in both allowing invalid email addresses and denying valid ones) requires understanding of the email specification. For example, the \.[A-Za-z]{2,6} is not required at the end of a valid email address, although it's quite rare not to have one. And I thought I once read that the first character in an email address could not be certain things, like a period, but I could be wrong on that. Read this to get a sense of the complexity: http://en.wikipedia.org/wiki/Email_address

Link to comment
Share on other sites

The backslash is not technically required before periods and hyphens in classes but there was a bug some years ago that caused me to use it in those situations. You can go without it, and if you see errors in your PHP installation, then go ahead and escape them.

Link to comment
Share on other sites

Thank you, Larry! This is an important clarification. 

 

I wasn't just idly waiting for your reply, and did a bit of research. Turns out, there are at least two more answers to my initial question that appear to be correct. Instead of this: 

 

 

 

^[\w.-]+@[A-Za-z0-9\.\-]+\.[A-Za-z]{2,6}$
 

 

 

-- or this (without escape characters):

 

 

 

^[\w.-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}$
 

 

 

-- the same "perfect" email validation pattern apparently can be expressed a bit shorter, like this:

 

 

 

^[\w.-]+@[^\W_]+\.[A-Za-z]{2,6}$
 

 

 

-- or even like this:

 

 

 

^[\w.-]+@(?!_)\w+\.[A-Za-z]{2,6}$
 

 

 

I'm wondering if you, Larry, or other guys on the Forum, would agree to take a look at these two examples and confirm that they would do the job in PHP context. 

 

(I would understand if no one replies though: people seem to get annoyed by children when they ask endless questions, and I've never really dropped that childhood habit. It's not an empty curiosity in my case, however: I rather think of it as slightly extended "Review and Pursue" section.)

Link to comment
Share on other sites

What do you mean by "do the job"? If you mean "only allow valid email addresses", then No, they won't do the job for reasons already explained. I hate to say this again and bring on your wrath, but the most reliable way to validate email addresses in PHP is to use filter_var().

 

But if by "do the job" you mean "Allow most valid email addresses without allowing too many invalid email addresses while using a regular expression because I don't/won't/can't use filter_var()", then Yes.

Link to comment
Share on other sites

Oh, Larry, when one day you finally meet me in person, as I hope, you will realize that wrath and yours truly are completely incompatible. 

 

I got the point about filter_var() being the preferred method, thank you very much for reinforcing it, Larry.

 

Still, what I meant by regular expression that's designed to match an email address "doing the job" is this: validate only email addresses that may include letters, dots, dashes, numbers or underscores before @, letters, numbers, dashes and dots but NO underscores after @, and a certain reasonable number of letters after the last dot. No more, no less. 

 

Which is another way of saying, "validate about 99.999% of all email addresses used in countries familiar with Latin alphabet". Imperfectly, perhaps, but well enough. 

 

(I'm actually astounded by the fact that I somehow managed to express my desire to understand how to create that pattern in such confusing manner, that apparently anyone reading this thread found it easier to understand the complex subject matter of regular expressions a lot better than the simple thing that I was asking. But then again, English isn't my first language... )

 

Also, I hope, no one would deny that this discussion, even though quite thrilling at times, however, thanks to everyone participating in it, added a bit of useful content to this already content-rich forum. 

Link to comment
Share on other sites

Yes, that pattern--^[\w.-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}$--will restrict it to strings that only contain letters, numbers, underscores,  periods, and dashes before an @, followed by exactly one @, followed by some number of letters, numbers, periods, and dashes, followed by exactly one period, followed by 2-5 letters. That's exactly what that does.
  • Upvote 1
Link to comment
Share on other sites

Great.

Now let's see if we can further improve it.

The pattern I've arrived to (thanks to everyone's help!) is this:

^[\w.-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}$

It will validate email addresses like these:

somename@somewebsite.com, somename@some-website.com, or somename@somewebsite.co.uk --

-- however, it will also validate these ugly things:

somename@some......website.com, somename@--somewebsite..com -- and similar things.

To fix that problem, we can try to replace the pattern after @ that includes some number of letters, numbers, periods and dashes:

[A-Za-z0-9.-]+

-- with the pattern that would validate if the string starts with (at least one letter or number, followed by either one period sign or at least one dash) used zero or more times, followed by at least one letter or number:

 

([a-zA-Z0-9]+\.|-+)*[a-zA-Z0-9]+

Which results in the following regex:

^[\w.-]+@([a-zA-Z0-9]+\.|-+)*[a-zA-Z0-9]+\.[A-Za-z]{2,6}$

Now I'd be curious to know what are other possible vulnerabilities in it that may be fixed. Would appreciate some input from the esteemed members of this forum!

Link to comment
Share on other sites

Hello Dimitri. I understand you have an intellectual curiosity about this, but in my opinion, you're spending a lot of time and effort to come up with an inexact solution to a problem that's already been solved. That, of course, is your right. But I've already provided a link to all the technical possibilities of an email address. I've already posted the regular expression that you'd eventually need to get to if you wanted to take this to the end. And myself and others have already stated how we would address this issue (using filter_var()). 

 

So I'm going to bow out of this discussion now. I'm always happy to help people (and that's what the forum is for), but I just don't have the time to help others on what I consider to be quixotic pursuits. Good luck with it.

  • Upvote 2
Link to comment
Share on other sites

 Share

×
×
  • Create New...