Jump to content
Larry Ullman's Book Forums

Recommended Posts

This week, I thought i'd take the time to work more appropriately with Databases and JOINS. So to do this I'm mocking up a website and I'm going to store all the content for it (most of the content) in a db. I've tried to keep it within the normalization standards and am now playing with some JOINS. What i'm finding though is that the first join i'm doing is providing a duplicate set of results. Can somebody outline what I am doing wrong here please.

 

This is my query:

 

SELECT content.headline, content.content, links.link, links.anchor FROM content INNER JOIN links ON content.page_id = links.page_id

 

And my tables:

 

`content` (this is for the various pieces of content i'll have

 

content_id int(11) NO PRI NULL auto_increment

content text NO NULL // actual content

headline varchar(255) NO NULL // headline of title

page_id int(11) NO NULL // what page the content belongs to

date_added datetime NO NULL // probably not needed but it's for my reference

 

`link`

 

content_id int(11) NO NULL // Foreign key and content that link belongs to

link varchar(255) NO NULL // actual href value

anchor varchar(255) NO NULL // anchor text for that link

link_id int(11) NO PRI NULL auto_increment // Link primary key

page_id int(11) NO NULL // page that link belongs on

 

 

`meta_descr`

 

meta_descr_id int(11) NO PRI NULL auto_increment // primary key

keywords varchar(255) NO NULL // keywords content

description varchar(255) NO NULL // description content

content_id int(11) NO NULL // foreign key and content that meta info belongs too

date_added datetime NO NULL // for my beenfit

 

`pages`

 

Field Type Null Key Default Extra

page_id int(11) NO PRI NULL auto_increment // Not sure i need this table?? But one of the guidleines on NF made me think I did.

meta_descr_id int(11) NO NULL

 

 

Love to hear your thoughts on my issue (Is it possible to just use DISTINCT to remove duplication? However I feel that if possible this would be a hack rather than proper practice). And also whether you disagree/agree with my attempt at normalization.

 

Thanks

Link to comment
Share on other sites

Think I figured out my problem. I should have used:

 

SELECT content.headline, content.content, links.link, links.anchor FROM content INNER JOIN links USING (content_id)

 

But still open to hearing people's ideas on this topic, if they have any :)

Link to comment
Share on other sites

Yeah, definitely, you want to join on content_id, not page_id. I mean, if the links link to the content, I can't imagine links and corresponding content being on the same page. Well, I could imagine some other content and links on a page, but I digress.

 

Also, since the field names are the same for the keys, USING (content_id) works fine. You could also do ON content.content_id=links.content_id.

 

Lastly, I'm assuming that your links table is named links, not link, right? I think that might have been a typo on your part.

 

More in general though, if each piece of content has a unique link and anchor, then separating the info into two tables seems unnecessary to begin with.

 

I don't mean to discourage you. Just something I thought of. Of course, I don't know the full situation either, so perhaps my comment is completely irrelevant.

 

As you may recall, I started that post regarding joins a week or so ago. Through that post, everyone's suggestions and really thinking about joins a lot, I think I finally understand them. Now, when I look at the examples in Larry's book, they're so simple. It's amazing the change.

 

Anyway, you seem to be at the same point as me, so it's good to know that you're taking that knowledge and applying it. Good luck.

  • Upvote 1
Link to comment
Share on other sites

Thanks for your replies.

 

To elaborate on my schema. This example I wanted to pick a scenario that wasn't out of a book as it's too easy to just replicate tables, columns and queries and then think you've got it nailed. The content table houses all the content for the site. It could be for instance 4 separate pieces of information on the index page that outlines what you do. So the `headline` would be 'service one' and the `content` would be the opening paragraph lets say. The page_id for index is say 1 as it's the landing page, page_id for 'about.php' could be 2 and so on. The content_id is the primary key and used as a foreign key in other tables.

 

The `links` are housed in a separate table because not all content will have a link. This happens because the content on many but not all pages is not in it's entirety. For example: the `headline` 'service 1' and `content` 'this is some intro text about service1' will have a link ('service.php?s=service1') to read more about this service. But the full main article will not need any 'further reading' or needs for links to go anywhere as it is in it's entirety. Hence why there is a separate links table.

 

The meta table is again used to pull a more relevant set of meta tags information as a lot of the pages are dynamic and make use of get variables.

 

I hope this clears up a little the schema.

Link to comment
Share on other sites

I thought about your schema for quite a bit, but am still confused.

 

To me, and again, this is just off the top of my head, it seems like with things like articles, there is no chance of data redundancy. I mean, each article has a title, content, a general description, a link to the article (I think) and perhaps some metadata about the article.

 

I don't see why all this information can't be in the same table, since it would more or less be unique to each article.

 

I suppose the one exception would be the metadata, which might be shared, in which case, by all means, make a separate table for that.

 

Still though, I'm having trouble imagining the need for all those tables.

 

I do trust you know what you're doing though, Jonathon, so more than anything, I just want to understand the logic behind your decision.

Link to comment
Share on other sites

Yes in terms of articles and blogs you are correct. I neglected to mention those tables. The idea is fairly rough and I am still not 100% on some aspects of its design but we'll see.

 

I shall try to explain, because I do feel I am having trouble getting it across :-( So I shall do it in bullet form(ish).

 

- ALL content is stored in the database in this example (even though in some case it would be simpler just to code it in HTML. This was in order to make the concept complicated and unique.

- You have a website. This website offers 3 services

- photography

- sales of cameras

- Camera maintenance

 

- On the index page there is a small introductory text that is unique to that page about each of the 3 services. This is not just the first x characters of the main content, this is to avoid a duplicate content issue. Each service's small introductory text has a link to to that service in full.

- So photography's intro text will lead to services.php?service=photography or services/photography

 

So from the index page we have a table that needs:

 

- content_id (primary key)

- content (introductory text on the service)

- link (where does the introductory text take me to so I can read in more depth about this service)

- anchor (I'm trying to imagine I was doing this properly and so I'd want strong internal links that contain good SEO structure, rather than omitting this table and just generically putting 'read more')

- headline (So again I can place this between <h1> tags for search engines. i.e:

<h1>photography</h2>
<p>This is some intro text about photography<p/>
<a>Great photographers</a>

 

But this is not the table I have used!! I shall show you why

 

I click on services/photography and the page needs to load:

 

- headline ("Photography at a glance")

- content (full content on the topic photography)

- content_id (primary key)

- There is no need for a link on this content or subsequently any anchor text because the content doesn't go anywhere and putting the same generic link 'back to somepage.php' would not conform to NF rules

 

Therefore: I made two tables:

 

Table 1 - content:

 

- content_id

- content

- headline

 

Table 2 - links:

 

- content_id (foreign key, meaning the join pulls the content and links that go together)

- link (self explanatory)

- anchor (self explanatory)

- link_id (primary key, not all content will need a link (hence two tables) so at some point there will be:

- content_id = 5

- link = 'services/photography'

- anchor = 'My favorite photographer'

- link_id = 4

 

I threw in a meta-description table because as the services page relies on get variables it will need to alter dynamically the meta tags depending on the content being displayed. But not all content will need a meta tag, like the introductory text on the index page won't so putting these in the same table would result in empty rows.

 

I hope this goes some way to explaining my reasoning. You may also notice that i had columns and a table called related to page_id. I'm not sure i need it, but when I was working this out some reason made me think that I did. I quite possibly don't. I'm sure i'll remember why at some point I put it in.

Link to comment
Share on other sites

Thank you for taking the time to explain all that. I know how frustrating it can be when you explain something, and the person on the other end doesn't seem to get it.

 

I do think I understand it now though. With that said, I still think you only need one table for what you're doing. Errr, let me rephrase that. I think you only need one table for the way you're doing what you're doing. Using more than one table makes it seemed really forced. You even said yourself that you want multiple tables, so that you can practice joins.

 

If it were me, I'd bring things back closer to Larry's coffee site. Basically, you could have different categories for camera equipment, and within those categories, individual cameras. And within the cameras table, I'd add little blurbs about the cameras, etc.

 

Anyway, do what you gotta do, but if you're organizing your site on blocks of content, multiple tables does not seem necessary.

Link to comment
Share on other sites

Without reading your newest post, I recommend you to write VIEWS to display the information needed. Try writing them general so you can add WHERE clauses when displaying data.

 

To actually identify WHAT you want to display, and figuering out HOW to display it, is a much harder task than creating tables for the content. I recommend you start that way. :)

Link to comment
Share on other sites

Thanks to HartleySan and Antonio for your replies. I shall look into Views for MYSQL. Yes I do agree with you HartleySan regarding camera sales. A similar table DB design to Larry's coffee site would be ideal. I enjoyed playing around with joins and spending some time to better understand them, so the main objectie has been achieved. But.. if I may, let me propose this scenario to you (or anybody who wants in), it's only fictitious, so there is no rush and no obligation, but I hope it gives you an insight to my thoughts on design:

 

What table structure would you have for this scenario:

 

1 - All content must be in a DB

2 - There are 2 pages

3 - Index.php + person.php

4 - on index.php there are 3 divs.

5 - Each Div contains the following:

       a) <h1>A Person's Name</h1>
 <img src="path/to/image" alt="specific to person" />
       c) <p>A small 300 character long introductory text (This must be unique and not just the first 300 characters from a larger set of text)</p>
       d) <a href="person/persons-name">related anchor text(Not just 'read more')</a>

 

6. we'll ignore person.php solely and concentrate on dynamic variations of person.php

7. So there can be person/person-a or person/person-b or person/person-c depending on what person the user wants to read about

8. The page must load:

       a) accurate titles for the page
 accurate meta description + keywords
       c) <h2>The person's name</h2>
       d) <p>A unique full length, several hundred words of content</p>
       e) And that's it

Remember, all image alt tags, anchor text, meta description, meta keywords and page titles must be seo friendly and accurate.

Also remember that this is a small example. But if it were a fully blown site there could lots of dynamic variation of pages and lots of content, alt tags, meta etc that would need to be accurate

 

- Bearing in mind that on index.php there is no need for each piece of content to have it's own meta tags as the index page itself would have its own meta tags already about the site. But on person.php each piece of content does need appropriate matching meta tags. So straight away you would have a situation where, if you put all the information in 1 table, there would be some content that wouldn't require a `meta description`, `meta keyword` or `meta title`. So potentially 3 rows could be empty on certain rows.

 

- Also, on person.php the content doesn't need to have a link or any subsequent alt tags or anchor text as the content doesn't go anywhere so there's nothing to link to. So if we have 1 table, there will be rows of content that would have an empty `anchor column`, empty `link column`, empty `alt_tags column`.

Link to comment
Share on other sites

I'm kinda hoping that either Larry or Matt will chime in eventually, as their database design knowledge far exceeds mine (it's one of my weaker points). However, I still feel that one table is fine.

 

I think at this point though, it comes down to a matter of preference; how far do you want to go to normalize your database? Keep in mind that normalizing your database does help decrease the odds of data integrity issues, but it also comes at the cost of performance. Having to occasionally use NULL values when appropriate is perfectly fine, I think.

 

I feel like where my thinking differs from your is that you are thinking that for the home page, you'll only need certain pieces of information, whereas on the individual people pages, you'll need other pieces of information, and you are seeing this as an impetus to create multiple tables, one for each page. This, to me, does not equate to the need for more than one table. I think it simply means that you only need to pull the key pieces of info for particular records, depending on the page.

 

As for SEO friendliness and accuracy, that simply comes down to two things, I think:

1) Using good, semantic HTML, which is unrelated to the database structure.

2) Picking good terms/values for the meta tags, which again, is unrelated to the database structure.

 

So in the end, I'm sticking with my idea that one table is okay, but I'm also willing to relent on the fact that having more tables isn't wrong, but more a preference thing. I tend to err on the side of performance, even if that means using a few extra NULL values here and there. Similar to another post Larry made recently, if you feel that you will continually be using the same content for a field, then by all means, pull that field out into a second table. Again though, I think that's a preference thing.

 

To me, if the values in a field are not dependent on each other, even if there is some repetition, that's fine. For example, a listing of persons will more than likely have multiple John Smiths, but I would not personally see that as grounds to create a foreign key for the name, and store the name John Smith in a separate table. Again, some people might disagree with me, but I don't think either is wrong, and common sense tells me not to take things that far.

 

To be more specific with your provided example, I'd have a persons table with the following structure:

 

id
name
metadata // Perhaps comma separated, or whatever is necessary.
page_title // If this is just some variation of the person's name, this might not be necessary.
short_desc
full_desc
image_path
last_modified

 

And for the home page, I'd pull the name, image path, short description and ID from the database. The ID would be used for the links, of course. And from the individual pages, I'd pull the name, page title, metadata and long description from the database. Of course, this would be for the record with the ID retrieved via the GET method.

 

Anyway, I can't think of any simpler, more straightforward way to do it. If you want to complicate things with more tables, go ahead, but I don't think it's necessary (but it's not wrong either).

 

Sorry if my post got repetitive there. Just wanted to make sure my point was clear.

Link to comment
Share on other sites

Hello HartleySan,

 

Thanks for your reply. I understand your view point and see that in reality you wouldn't store all info in a DB and 1 table is perfectly manageable. That's what I would do. However, this situation I made complex in order to create multiple instances of void data.

 

Where you said:

I feel like where my thinking differs from your is that you are thinking that for the home page, you'll only need certain pieces of information, whereas on the individual people pages, you'll need other pieces of information, and you are seeing this as an impetus to create multiple tables, one for each page. This, to me, does not equate to the need for more than one table. I think it simply means that you only need to pull the key pieces of info for particular records, depending on the page.

 

That isn't strictly true, there isn't one table per page, there would be 3 tables for the site, that in reality could have multiple pages. It is all down to preference, but using this extreme rule set I made, I felt that it allowed for a use of joins. But again it's all preference, had this of been a real site, then my content would have been more mixed between DB and hand code. But for me, there is still too much data that can be empty.

 

You missed off the anchor text, link itself and also the alt tags. Again this comes down to personal interpretation as with so many web related things. But for me SEO is more than semantic code. For instance, the alt tags are the fourth thing a search engine scans on loading a page. So it was important that they are tagged properly and hence there inclusion in the DB.

 

So when I tried to break down all the elements of a page i ended up with occasions where, `meta description`, `meta keyword`, `alt tag`, `link`, `anchor text`, `into_text` `img_path` could be empty. This for me just seemed wasteful i'd say nearly 40-50% at any one time could be empty. So by using a content_id for an individual content entry, that allowed for me to use it as a FK that allowed for 3 tables, no null fields.

 

Thanks for your response though, I think we've probably taken this (some what unrealistic example) as far as possible :lol: . But i've appreciated your input. And more importantly, had a good play with joins

 

;)

Link to comment
Share on other sites

Sorry for neglecting the anchor text, the link and the alt tags. All the same though, I'd just add them to the one table, even if at least one value is NULL 50% of the time.

 

Well, like you said, we've kinda beaten a dead horse with this one, so we'll just let it go.

 

In the end, you got a lot of join practice, and it really does come down to preference at a certain point, so I can't say that one is better than the other.

Link to comment
Share on other sites

 Share

×
×
  • Create New...