Normalization

TheWildJacko · June 28, 2012

Hi,

I'm really enjoying working through this book; I'm currently on Chapter 7. However, I have a question about database normalization from Chapter 6. I want to create a database of musical chords; for simplicity's sake, I'm limiting the chord types to the four basic triads and their inversions for now. The different types of data I'm including are the letter name (C, Db, F#, etc), the chord quality (Major, minor, Augmented and diminished), and the inversion (root position, 1st inversion and 2nd inversion). So my database would look something like this:

C Major Root Position

C Major 1st Inversion

C Major 2nd Inversion

C minor Root Position

C minor 1st Inversion

C minor 2nd Inversion

C Augmented Root Position

...

C diminished Root Position

...

C# Major Root Position

...

etc. x9

Db Major Root Position

...

etc. x9

As you can see, each column has a lot of repetition (there are 252 triads in the database), but since these are definitions, none of the data is likely to change at any point. My question is, in order to normalize this database, would I want to make several tables, like the following:

database name: chords

table 1:

letter_names

column 1:

letter_id

(primary key)

column 2:

letter_name

table 2:

quality

column 1:

quality_id

(primary key)

column 2:

quality

table 3:

inversions

column 1:

inversion_id

(primary key)

column 2:

inversion

table 4:

triads

column 1:

triad_id

(primary key)

column 2:

letter_id (foreign key)

column 3:

quality_id (foreign key)

column 4:

inversion_id (foreign key)

where table 4 cross-references the other 3 tables? Or would one table suffice, similar to table 4, but without the extra ID columns? In other words, a triads table with triad_id (primary key), letter_name, quality, and inversion as its four columns, and all the data in that single table?

What I'm actually confused about is the section on indexes. The book says generally one shouldn't index columns that have a very limited set of values (quality only has 4 possible values that repeat for different triads, inversion has 3 values, and letter_name has 21--higher than the others but still relatively limited). However, it also says that columns that are frequently used in WHERE or ORDER BY queries should be indexed, and I want to be able to display the triads database in several different ways and select particular groups of chords from the larger database. Also, the section before says columns that repeat a lot of data should be given their own table in order to adhere to normalization.

What would the best practice be in this case?

Also, I think I may not have reached this point in the book yet, but assuming I go with the 4-table structure, table 4 ends up with a bunch of empty columns--is it supposed to be like that, and I get my values by cross-referencing the other tables? Or do I have to get the values from the other three tables into table 4 somehow?

Thanks.

Jake

HartleySan · July 2, 2012

I think you have the right idea and the general layout of your tables is fine.

I'm not a music person, but why would the 4th table end up with a lot of empty values? I don't understand that part.

As for indexes, when there are not a lot of values in a particular column, creating an index for that column serves very little purpose because an index will create another copy of the table, making INSERT and UPDATE queries slower, but at the same time will barely speed up SELECT queries.

In your case, because your first three tables are so small, I wouldn't worry about indexes, as not having them won't make a difference in the speed of SELECT queries.

TheWildJacko · July 2, 2012

Thanks for the response. Actually, I phrased the part about the 4th table wrong. What I get when I look at triads in phpmyadmin is this message:

"MySQL returned an empty result set (i.e. zero rows). ( Query took 0.0005 sec )"

I want to be able to reference one table, triads, in a query and get values from the other three tables. I thought setting up inversions, quality and letter_name in the triads table as foreign keys to the equivalent columns in the other three tables would enable that, but then I'm not sure how to set up my queries to return the other values. I didn't put any rows in the foreign key columns of triads because I thought that would be redundant, and that the point of using foreign keys was that you could then cross-reference. I'm obviously missing something, I'm just not sure what.

Thanks for your help.

margaux · July 2, 2012

Look up JOINs as that will achieve what you're after. However, you will need values in the foreign keys columns, otherwise how will you cross-reference the records in the corresponding tables.

HartleySan · July 3, 2012

Like margaux said, you'll have to use joins to get the info you want. What this means is that you'll have to reference all of the tables, but don't worry about that since you'll be able to do it in one query (using joins).

If you read the part in the book about joins, hopefully it'll all make sense, but to offer some assistance beyond, "Go read the book!" I'll try and demonstrate how to write a query like the kind you're looking for. Keep in mind that there are many different syntaxes for joins, but the syntax below is my preferred method.

Let's say that you want all the info about all the various C chords. In that case, you can write a join as follows:

SELECT letter_names.letter_name, quality.quality, inversions.inversion
FROM letter_names, quality, inversions, triads
WHERE letter_names.letter_id = triads.letter_id AND quality.quality_id = triads.quality_id AND inversions.inversion_id = triads.inversion_id AND letter_names.letter_name = 'C';

On a side note, I might recommend not using the same name for table and column names (e.g., "quality" and "quality").

TheWildJacko · July 3, 2012

Thanks, that's really helpful!

OK, so I just inserted a test row into triads where letter_id = '1', quality_id = '1'and inversion_id = '1'. triad_id auto-increments, so I didn't bother including that in the INSERT query. Then I ran the example query you gave me and it returned C Major Root Position, which is the result I wanted. Looks like I definitely need to study up on joins!

So I guess there still needs to be a row in the triads table for every unique triad (252 of them in this case), but I would populate the columns with ID #'s which will refer to the full values in the other tables. And then I can use joins to cross-reference and retrieve the values.

It still seems like the triads table is going to have a lot of repetition in it:

triad_ID 1, letter_id 1, quality_id 1, inversion_id 1

triad_ID 2, letter_id 1, quality_id 1, inversion_id 2

triad_ID 3, letter_id 1, quality_id 1, inversion_id 3

triad_ID 4, letter_id 1, quality_id 2, inversion_id 1

triad_ID 5, letter_id 1, quality_id 2, inversion_id 2

triad_ID 6, letter_id 1, quality_id 2, inversion_id 3

etc...

But that's OK in this case?

-----

Update: It definitely works like a charm now that I've populated all the rows with ID numbers. All the C chords displayed. Thanks again!

HartleySan · July 3, 2012

Good to hear.

You could write a simple PHP script to make all the necessary inserts into the triads table. This would save you a lot of time and also any human-related errors.

And to address your concern, yes, the triads table will have a lot of repetition, but that's unavoidable when you have a many-to-many relationship. Also, no matter how much repetition you have, the key is that no two rows are the same (not counting the unique triad_ID), which is what really matters.

Sign In

Normalization

Recommended Posts

TheWildJacko

Link to comment

Share on other sites

HartleySan

Link to comment

Share on other sites

TheWildJacko

Link to comment

Share on other sites

margaux

Link to comment

Share on other sites

HartleySan

Link to comment

Share on other sites

TheWildJacko

Link to comment

Share on other sites

HartleySan

Link to comment

Share on other sites

Browse

Activity