Jump to content
Larry Ullman's Book Forums

Some Questions About Database Normalization


Recommended Posts

Hi everyone,

 

having read the chapter on database design, I have a few questions about how to normalize my database.

 

Lets say I have a website about all of the museums in the world - what would be the best way to store location information? Can I have columns for city, region, country and continent all in one table, as below?

 

museum_id

museum_name

city_name

region_name

country_name

continent_name

other columns with foreign keys

 

I did have separate tables for each of the location columns but it becomes exceedingly difficult if I have a table with 500 museums and multiple columns with foreign keys. Using phpMyAdmin it isn't easy to see what's going on if I have rows consisting of many numbers. Isn't this confusing?

 

In chapter 6, Database Design, there is a publishers table on page176 which also has all of the location (address) information in the same table.

 

Are there any advantages/disadvantages to having the location columns in the same table vs separate tables with regards to SEO or the ability to sort/filter/search the contents of the table? By the latter I mean when a website visitor uses the website's checkboxes/autocompletes or text inputs to find/filter information.

 

One benefit to having separate location tables might be when it is necessary to include extra information pertaining to a specific region or city, such as attractions or accommodation types.

 

 

 

Thank you!

 

 

Link to comment
Share on other sites

Normalization is a process. You need to start what you need to do with the data, and normalize it from there. If you don't need location data for anything but displaying it to the user, you might as well save it in human-readable form on the museum itself. If you need to filter/sort the museums in regards to continents, countries or regions, you'll need separate tables for them all. With that said, if sorting/filtering is all you need to do with locations, a hybrid solution might do the trick. I.e, you save both a human-readable location on the museum and save foreign keys to the location parts.

 

The way to solve the problem with looking at data, creating views that display that information is what you need to do.

 

There's several ways to solve a sorting/filtering issue. You might do a simple Autocomplete textfield against the human-readable location, or sort them by a nested data structures that updates based on previous selections in drop-down lists.

 

You are correct about the latter part. That is one of the reasons why you'd want to normalize the data properly. To summarize, you need to explain your use cases a little more in detail for us to really help you out. Your use cases/future uses cases are the most important aspect for how to normalize the data.

  • Upvote 2
Link to comment
Share on other sites

Hello Antonio,

 

thank you for your thoughtful response. I like your proposal of implementing a hybrid solution - perhaps I can have one column in the museum table that lists all of the location details and still have the foreign keys as usual.

 

One thing I can't understand is why separate tables are necessary to filter or sort the information. If I need to find all museums in Canada, and I have the actual country name in the museum table, won't it then return those specific records?

 

You asked me about my uses cases. Well, for now I'm attempting to use the checkboxes to sort/filter the records. That is if I manage to get them working. I'll probably end up using something like an autocomplete as well.

 

 

Thanks again for your help.

 

Link to comment
Share on other sites

The short answer is that you do not need separate tables for filtering and sorting. You need separate tables to help preserve the integrity of the data.

 

As an example, imagine that you had all your museum data in one table (including country names). Now imagine that for one reason or another, the name of a country changes. You now have to go into your database and change the name of that country everywhere it occurs (i.e., for every museum in that country).

However, if you have your DB properly normalized across several tables, then you would only have to change the name of the country in one place, and that change would automatically carry across your entire DB, thus making it much easier to preserve the integrity of the data.

 

Also, please keep in mind that my example above is very trivial, but in a real DB with millions of records, having the data properly normalized is paramount to properly maintaining everything and not letting your data become invalid.

  • Upvote 1
Link to comment
Share on other sites

At the end of the day, you should do whatever works for you, but one think I've learned the hard way is that once a site is launched and even just a little bit big, changing the DB structure around is about the most difficult thing to do for the site. In fact, it's pretty much impossible.

As such, it's best to follow relational DB best practices, even if they don't all make sense at first (and some best practices will seem like overkill for smaller sites).

 

And always, if you don't understand something, don't be afraid to ask about it here.

Link to comment
Share on other sites

 Share

×
×
  • Create New...