Jump to content
Larry Ullman's Book Forums

Trying To Understand The Effect Of The On In The Sql


Recommended Posts

UPDATE: note, to save you some time reading, you may want to read my next post (the third post on this thread) instead. After Matt responded to the OP, I realized that the OP might not be clear enough, because Matt answers a question that I wasn't trying to ask. In the third post below, I hopefully make the question clearer.

 

In an effort to better understand this SQL quoted below (which is from page 196), I ran this SQL twice, except the second time I ran it, I removed from line 4 this code

ON s.id = sc.size_id

 

the first time I ran it, there were 10 results, with 10 different SKU ids. In other words, no duplicate results.

 

The second time I ran it (removing ON s.id = sc.size_id), there were 30 rows returned, and therefore many duplicate results.

 

the problem is, I don`t understand

 

a) how s.id = sc.size_id limits the results to 10 without allowing duplicates.

 

2) why, upon removing that code, did it return 30 results, as opposed to 27 or 94. 30 must represent the total number of something but what?

 

 

 


SELECT gc.description, gc.image, CONCAT( "C", sc.id ) AS sku, CONCAT_WS( "-", s.size, sc.caf_decaf, sc.ground_whole, sc.price ) AS name, sc.stock
FROM specific_coffees AS sc
INNER JOIN sizes AS s ON s.id = sc.size_id
INNER JOIN general_coffees AS gc ON gc.id = sc.general_coffee_id
WHERE general_coffee_id =3
AND stock >0
ORDER BY name ASC
LIMIT 0 , 30

Link to comment
Share on other sites

1) how s.id = sc.size_id limits the results to 10 without allowing duplicates.

To understand why this is happening, then we first have to understand what the "ON" clause is doing within an INNER JOIN.

 

This is taken from the mysql website:

Generally, you should use the ON clause for conditions that specify how to join tables, and the WHERE clause to restrict which rows you want in the result set.

 

The "ON" clause is used in an inner join query to select data from 2 tables where a match is made between data in one column from one table and one column in the other. In your example, "ON s.id = sc.size_id" is looking for matches between data in the "s.id" column and "sc.size.id". Wherever a match is made, then those records are being selected, if and only if, they match the data defined in the "WHERE" clause.

 

I hope that makes some sense.

 

Larry gives a nice crash course in "oins" in his "PHP 6 and MYSQL 5" book!

 

2) why, upon removing that code, did it return 30 results, as opposed to 27 or 94. 30 must represent the total number of something but what?

 

What it sounds like you are getting is what's called (in systems analysis lingo) a "Cartesian Product".

 

A cartesian product of multiple input sets is "a larger set containing every ordered combination of the input set elements." Translated into plain English, this means that each and every row in the first table is joined to each and every row in the second table (hence, the increase in the number of records output in your query). Failure to define a join conditional - such as an "ON" clause - when creating inner joins will result in Cartesian Products!

 

Hope that helps!

  • Upvote 1
Link to comment
Share on other sites

Thanks for trying, but you didn't answer the question that I was trying to ask. it's my fault for not being clear enough :)

 

I understand the general purpose of ON. However, I didn't understand how this PARTICULAR ON worked in this instance. Namely, why does useing the limit of

ON s.id = sc.size_id

produce a result with 10skus and none of them repeated. In order to answer this question, you would probably have to look at those tables (see images below), and not just the theory behind ON.

 

In other words, I looked at those tables, and I never would have thought that creating a query of

ON s.id = sc.size_id

would produce a list of the 10 Skus with no repeats.

 

How did Larry know that creating that ON would limit the results to the 10 unique skus with no repeats.

 

Here are the two tables for your reference. Why does joining them where s.id = sc.size_id create a result of 10 (no repeats), whereas a list of 30 is created without this join. How does the join eliminate the repeats.

scsize.jpg

 

scspecific.jpg

Link to comment
Share on other sites

Thanks for trying, but you didn't answer the question that I was trying to ask. it's my fault for not being clear enough :)

 

Sorry MikeMikeMike! I didn't know how far along you were with using SQL.

 

Namely, why does useing the limit of

ON s.id = sc.size_id

produce a result with 10skus and none of them repeated. In order to answer this question, you would probably have to look at those tables (see images below), and not just the theory behind ON.

 

The answer to this question is fairly easy! You are essentially selecting all records from the specific_coffees table WHERE the general_coffee_id = 3. If you look at the specific_coffees table, there are exactly 10 records, and all of them have a general_coffee_id of 3, so you should therefore have 10 records in your query result!

 

As for why there are 30 records when you take out the ON clause, I would need to look at the query more and think about it. Whatever the reason, there is some sort of a Cartesian Product going on there. It might have something to do with the specific_coffees table (which has 3 records) and the general_coffees table (which has 10 records). 3 x 10 = 30! Again, I would have to sit and look at the query a bit deeper to be certain, but it's getting late here.

 

I hope that was a little better.

  • Upvote 1
Link to comment
Share on other sites

Unfortunately, I'm almost 100% sure you're not right.

 

With this On staement

ON gc.id = sc.general_coffee_id

In the SQL (along with the WHERE general_coffee id =3), but this ON statement

ON s.id = sc.size_id

ommitted from the SQL query, there are 30 results with duplicates of the SKUs, therefore this SQL

ON gc.id = sc.general_coffee_id WHERE general_coffee_id =3

does not limit the query to 10 results. It gets all ten types of specific coffees (which is obvious because there are only 10 types) but it also allows for duplicate records of the returns, which is 30.

 

This code

ON s.id = sc.size_id

removes the duplication, and I would like to know why if someone knows.

 

Also, as mentioned in the OP, the 30 rows with duplicates must represent the total value of something but what.

 

Maybe Larry or someone else can answer when he has time.

Link to comment
Share on other sites

Mike, in another post you specifically asked me to look at this post. You may not like my answer, however. First of all, per the forum guidelines, I'd really like to see you including your versions in use in all your posts. That information is almost always useful for those trying to assist you, and even when it's not, it's better to have more information than less. But the versions in use is what I consider to be just about the most important information, if not THE most. Which, of course, is why the forum guidelines specifically suggest you include such information. In fact, here's how it's stated in the guidelines:

 

Failure to abide by the above, in particular not including the versions you're using, will virtually guarantee that I don't answer your question or that it takes longer for your question to be answered.

 

 

So if you want me to answer your posts, I'd appreciate it if you'd abide by the guidelines. It just makes my job (in this specific case, providing free help to strangers) a little bit easier. Many people add these details to their signatures, so it's never missing.

 

Second, although you say that you're "almost 100% sure [Matt is] not right", he's essentially 90% right. He's describing how JOINs work, which is all that's happening here. You're asking a really basic question on how JOINs work and he's trying to answer you and he's much closer to being right than you are. Much closer. From my perspective, Matt's given you two correct answers and you've told him he's wrong twice. As you might imagine, such situations make me less inclined to chime in with the same correct answer. But since you've specifically requested that I chime in, here's that same correct answer in more detail...

 

In terms of the numbers, based upon the tables, I expect you're actually getting 50 records qualifying, not 30. You're only getting 30 records at the end because of your LIMIT clause. The reason for the 50 matches is because of Cartesian Products, as Matt said. There are 10 records returned by the JOIN across general coffees and specific coffees. There are 5 sizes. When you join 5 sizes with 10 records, you get 50 matches: 5 matches for each of the 10 records (again, this is what Matt was saying). The ON clause restricts how a match is made, specifically, THIS size_id must match THAT size_id. The effect is that instead of all 5 sizes being returned for each of the original 10 records, only the 1 matching size will be returned. This is why pretty much all JOINs should use an ON, USING, or WHERE clause to restrict the matches. That information is basic JOIN know-how, by the way. So the answer to the question of "how did Larry know this ON clause was needed?" is because that's a standard JOIN. This is a JOIN across multiple tables, but the premise is the same whether you JOIN 2 tables or 20: if you don't limit the JOINs to matching records (using ON, USING, or WHERE), you'll get Cartesian Products. If you removed the other ON clause, you'd get X specific coffees times Y general coffees times Z sizes results: a Cartesian Product.

 

And my thanks to Matt for helping out. I really appreciate it!

Link to comment
Share on other sites

MikeMikeMike, I literally reiterated the same thing in your other post just now. It looks like Larry and I answered your question at the same time. Also, Matt was right all along.

 

Larry, you ever consider bumping Matt's status up for two classy answers in a row in the face of adversity?

Link to comment
Share on other sites

Larry,

 

Thank you for answering the post with a clear explanation. By mentioning in your post that you are giving free advice to strangers, it makes you sound a little upset. The only reason I asked is because your books (which I bought) encourage me to ask questions here if I don`t understand, so I`m a little disappointed to get this put back in my face. Yes, i am a stranger, but I am also a customer and your books encourage me to post here. I don`t think it`s a good business practice to refer to your customers that way, especially when you encourage them to come to your forums.

 

But you also seem to imply that you are upset (or less inclined to answer) to provide this advice because Matt already provided the answer and you believe he was correct, or 90% correct. I totally disagree and I think your answer proves the point.

 

Matt provided two posts. In one post, he answered a question that wasn`t asked, and in the second post,when he tried to explain in detail the actual question that was asked, he was 100% wrong -- at least as far as I can see and which your answer shows.

 

For example, you explain the answer here

 

There are 10 records returned by the JOIN across general coffees and specific coffees. There are 5 sizes. When you join 5 sizes with 10 records, you get 50 matches: 5 matches for each of the 10 records (again, this is what Matt was saying). The ON clause restricts how a match is made, specifically, THIS size_id must match THAT size_id. The effect is that instead of all 5 sizes being returned for each of the original 10 records, only the 1 matching size will be returned.

 

This is exactly the information I was looking for.

 

Matt, in his second (incorrect post) said this which is 100% wrong

 

The answer to this question is fairly easy! You are essentially selecting all records from the specific_coffees table WHERE the general_coffee_id = 3. If you look at the specific_coffees table, there are exactly 10 records, and all of them have a general_coffee_id of 3, so you should therefore have 10 records in your query result!

 

 

because Matt is essentially claiming that it is this code

 

ON gc.id = sc.general_coffee_id WHERE general_coffee_id =3

 

that limits the result to 10.

 

Contrary to what Larry says, I was not asking how JOINS work. I was asking how this joined work in particular. Given that Matt also didn`t understand the proper answer, it might not be as simple as you imagine for people at our level. yes, I know how JOINs work and have read about them, but understanding examples can be more difficult.

 

Furthermore, Larry seems a little surprised I told Matt he was wrong twice. However, if I don`t agree with Matt`s answer, what am I supposed to do? Upon examination of Larry`s answer, I believe i am correct in saying that Matt was wrong!

 

Furthermore, for your next edition, as you claim in your books that you can explain Geek to Normal people, I think it would be useful to into more depth sometimes because it`s clear that people aren`t getting it!

 

Thank you for your assistance.

 

I will provide the version information next time I come.

Link to comment
Share on other sites

MikeMikeMike,

 

Sorry if my answer was not clear enough for you. I was just trying to help!

 

As far as why you were only getting 10 results for the query which included the "ON" clause, I was 100% right!

 

I originally said "The answer to this question is fairly easy! You are essentially selecting all records from the specific_coffees table WHERE the general_coffee_id = 3. If you look at the specific_coffees table, there are exactly 10 records, and all of them have a general_coffee_id of 3, so you should therefore have 10 records in your query result!"

 

The key parts from the original query are:

SELECT ...
FROM specific_coffees

and

WHERE general_coffee_id =3

The "ON" clause merely limits the results so that only 1 record from the sizes table is matched with each specific_coffees record.

 

As for the second question, yes I was a little off the mark!

 

The reason for the 50 matches is because of Cartesian Products, as Matt said. There are 10 records returned by the JOIN across general coffees and specific coffees. There are 5 sizes. When you join 5 sizes with 10 records, you get 50 matches: 5 matches for each of the 10 records...

 

This was exactly the conclusion I came to when I first read your post. However, since you said you were getting 30 records in your result, I decided not to mention that and I got sidetracked and started looking at it another way.

 

You're only getting 30 records at the end because of your LIMIT clause.

 

I completely overlooked the LIMIT clause, so my apologies for that. It was almost 3:00am in Japan when I wrote the last post, so I wasn't in the best condition to do any deep analysis of Larry's query. So, yes I was wrong as far as the second answer being exactly correct. Nevertheless, as I had said, a Cartesian Product was involved and I thought I explained that fairly well. And, as it turns out, the answer was far simpler than I had thought (i.e. the LIMIT clause restricting the result set to 30)!

 

I only try to post answers in this forum to questions that relate to areas I am fairly confident in!

That being said, I was an IT major and took several classes in database design and analysis over my university career. Although I may not be spot on in every answer, I still generally know what I'm talking about!

  • Upvote 1
Link to comment
Share on other sites

Hello Mike,

 

These kinds of threads, with a tit for tat, can quickly escalate, so I'm going to try to provide a couple of gentle replies and hope this doesn't get out of control.

Thank you for answering the post with a clear explanation. By mentioning in your post that you are giving free advice to strangers, it makes you sound a little upset.

 

Not upset at all. That's the perspective I have with these forums and the "strangers" are everyone: me, you, HartelySan, Matt, etc. In my mind, when people are looking for help, that's the perspective I'd like them to consider. I'm not saying you aren't doing this, I'm just saying that's the perspective I have and that I'd like people to consider.

 

The only reason I asked is because your books (which I bought) encourage me to ask questions here if I don`t understand, so I`m a little disappointed to get this put back in my face.

 

Yes, the books encourage readers to ask questions here. I'm not trying to put that back anywhere. And part of the reason I encourage people to ask questions here is that compared to being asked questions via email:

- Other people (besides me) can provide an answer.

- Other people can benefit from the exchange.

 

Yes, i am a stranger, but I am also a customer and your books encourage me to post here. I don`t think it`s a good business practice to refer to your customers that way, especially when you encourage them to come to your forums.

 

Yes, you are a customer but you're also a stranger. I don't think of that as a negative but perhaps you do. I'll stop using the "S" word from here on out. But even if you are my customer, you're not Matt's customer, he gets absolutely nothing out of trying to assist people for free (well, not absolutely nothing but he certainly has less to gain from doing so than I do, as a person trying to sell books). Moreover, because I very, very, very much appreciate anyone willing to help out in these forums (it makes a world of difference to me), it's critically important to me that such helpers feel appreciated.

 

I'll also say that I view these forums as a bonus to the books. That's my perspective which I expect some/many readers don't have (not saying you feel this way or not). The books should hopefully stand on their own and the forums are a bonus, above and beyond the price of admission. So I'd hope that you (the reader in general, not you specifically) feel that the $25 or whatever that you paid for the book, of which I'll get about $2, was worth it, and that you (general) appreciate the forum as being a bonus. Again, that's my perspective.

 

 

But you also seem to imply that you are upset (or less inclined to answer) to provide this advice because Matt already provided the answer and you believe he was correct, or 90% correct. I totally disagree and I think your answer proves the point.

 

Yes, to be clear, I am not (and was not) upset but I was less inclined to answer primarily because:

1. Matt seemed to be answering the question correctly, if not to your satisfaction.

2. You are saying Matt was wrong, when I think he was right (even if he wasn't explaining things to your satisfaction).

 

From my perspective, the thread looked like this:

Question

Right answer

No, I'll rephrase.

Right answer.

You're wrong.

 

That's a very simplified version, of course, and you were more delicate than that, but if I'm interpreting the thread this way, why would I answer the question? In my mind you've already been given, and rejected, two right answers. What's the benefit of a third?

 

Matt provided two posts. In one post, he answered a question that wasn`t asked, and in the second post,when he tried to explain in detail the actual question that was asked, he was 100% wrong -- at least as far as I can see and which your answer shows.

 

Respectfully, I disagree with your interpretation. I'm glad you felt my answer was helpful, and if you feel Matt's answers were wrong or just not helpful, that's your choice, but I don't think it's reasonable to blame me for not being inclined to answer a question that I felt was already answered correctly.

 

Contrary to what Larry says, I was not asking how JOINS work. I was asking how this joined work in particular. Given that Matt also didn`t understand the proper answer, it might not be as simple as you imagine for people at our level. yes, I know how JOINs work and have read about them, but understanding examples can be more difficult.

 

I can appreciate that understanding specific examples can be more difficult, just as in math class in school, they show you how to do A and then the quiz does B, which is a slight variation on A, and it seems like it's out of nowhere. To me, as I wrote before and with which you disagree, it seemed like you weren't following a basic principle of a JOIN. I understand that you're saying you weren't following this particular part of this particular JOIN.

 

Furthermore, Larry seems a little surprised I told Matt he was wrong twice. However, if I don`t agree with Matt`s answer, what am I supposed to do? Upon examination of Larry`s answer, I believe i am correct in saying that Matt was wrong!

 

To you, my reaction is strange, because you think Matt was wrong. To me, Matt's answers weren't wrong so... In answer to your question, if you don't agree with Matt's answer, a more diplomatic way to handle it (i.e., in a "free help from strangers" mentality), would be to express appreciation for the attempt but admit that you're still confused. Even if you did believe Matt was wrong, which you clearly do, changing the "almost 100% sure you're not right" to "Thanks but I'm still confused" would have totally changed the tenor of this thread. Not to hammer the "free help from S" angle, but I think it's good policy in such situations to always err on the side of being too appreciative. Somebody you don't know, perhaps halfway around the world, is giving up part of their day to try to help you and expecting nothing in return. That's a huge thing, in my opinion.

 

Furthermore, for your next edition, as you claim in your books that you can explain Geek to Normal people, I think it would be useful to into more depth sometimes because it`s clear that people aren`t getting it!

 

Four things here. First, the "Translating Geek to English" blurb is clearly a marketing thing. It's intended to be true, but it's not a contract. Second, you're entitled to your opinion of course, and you clearly see what's happened here differently than I do or Matt does or HartleySan does, but this little comment here verges on being insulting. In fact, it may be insulting, but I'm trying not to take it personally. Third, as I've just stated here, I do feel like your question was one that basic knowledge of a JOIN would have made clear (and, yes, I know you feel that's not the case), and I do feel like this question is below the intended audience of the book. Not trying to be insulting here. I wrote the book for experienced PHP and MySQL developers (I say this in the introduction and on its Amazon page), so in my mind, this kind of question wouldn't have come up for this book. Yes, you say this isn't a basic JOIN question and that you understand JOINs and I'll take you at your word. And, even if a reader (some other reader, not you) wasn't quite where they needed to be for this book and asked questions that I didn't expect, that would be absolutely fine, because that's what the forum is for, but when that same reader is telling people that are trying to help that they're wrong, that's a problem if I feel the answers are right.

 

Fourth, and most importantly, if there's something that's not clear in a book, I absolutely want to know about it so I can clarify the issue in the next edition. Absolutely. That's one of the benefits I get from the forum: it makes me a better writer and leads to better books. I'm currently writing the fourth edition of my PHP and MySQL book and I'm still expanding things, changing things, trying to make improvements so it's a better book for readers. That's my goal. And this book has been out a few months and I already have a handful of ways I want to improve the next edition (i.e., a running list of how I could have done better). But I'll be honest: this isn't going to make that list. In my mind, this is a question about a basic JOIN.

 

Thank you for your assistance.

 

I will provide the version information next time I come.

 

You're welcome and that would be appreciated. I'm going to re-read my response in the hopes of catching anything that might offend, as that's not my intention. The fundamental conflict here is that you, the person asking for help, has one sense of what's happened and at least three people trying to help have a different sense. I hope that you're not offended by anything I wrote, that you appreciate the efforts being made here, and that we can move on.

 

 

 

Link to comment
Share on other sites

 Share

×
×
  • Create New...