Jump to content
Larry Ullman's Book Forums

Understanding Redundancy And Normalization


Recommended Posts

I'm working an a shopping system based on the cart and am trying to understand the consequences of the following redundancy/normalization issue:

 

The cart will have table promo_codes with discount codes. Each discount code only works for a specific item (client's request). So each row in the table consists of :

id

name (like 'SALE' or whatever)

amount

expires (datetime)

 

Now, thinking ahead a little, chances are the client is going to name 75% of the codes with such original names like 'SALE' 'PRESALE' and 'EARLYBIRD'

 

At first glance this looks like a violation of 2NF, because the table has many rows with repeating data in the same column (a non-key which is not dependent on the primary key).

 

However, upon further analyzation, it seems to me that this isn't so. The examples of violations brought in Larry's other book, PHP 6 and MySQL 5, were 'Humphrey Bogart' and 'Martin Scorsese' where the multiple instances of Martin Scorsese within the same table are referring to the same human being Martin Scorsese. Here however, the multiple instances of the code 'PRESALE' are not actually related at all, they are different codes, on different items, which just happen to share a common name.

 

So if I understand correctly, this would not be a violation of 2NF, as each instance of 'PRESALE' is individual and thus is dependent on its primary key.

 

However, at the end of the day, the table is still going to have 3000 instances of 'PRESALE'. In other words, it is redundant, but not a normalization violation.

 

Does this mean the database is completely safe from data integrity problems even with 3000 (different) instances of 'PRESALE'?

Link to comment
Share on other sites

table promo_codes with discount codes. Each discount code only works for a specific item (client's request). So each row in the table consists of :

id

name (like 'SALE' or whatever)

amount

expires (datetime)

 

Now, thinking ahead a little, chances are the client is going to name 75% of the codes with such original names like 'SALE' 'PRESALE' and 'EARLYBIRD'

 

At first glance this looks like a violation of 2NF, because the table has many rows with repeating data in the same column (a non-key which is not dependent on the primary key).

 

You could reduce redudancy and remove data integrity problems (even if it would not become and acutuall PROBLEM) this ways:

 

promo_codes (id, amount, expires)

promo_coupon_names (coupon_names_id, name)

promo_names (code_id, coupon_names_id*)

 

- All coupon names are stored in promo_coupon_names. promo_names are where you link the actual coupons with the names they should use.

 

You are right about violation of 2NF. You are also right about why it's a violation.

 

Here however, the multiple instances of the code 'PRESALE' are not actually related at all, they are different codes, on different items, which just happen to share a common name.

 

Not buying this. Stick with your original thought. However, this is JUST a matter of redundancy, and something you absolutely can live with. My great teacher says you should strive to make things Boyce-Codd (INF3), but sometimes you accept a little redundancy because it makes your life easier. This falls under that category. Fix it if you like and want to, but it won't affect your system much. Coupons are small in numbers, so the extra data is not something to worry to much about.

 

Does this mean the database is completely safe from data integrity problems even with 3000 (different) instances of 'PRESALE'?

 

Perhaps, maybe not. If you are ever going to allow users to search for items that are on sale, a misspelled "PERSALE" could possibly not be found by the query. That is an integrity problem. I have a solution on have to fix this above, but it's of course a minor problem.

Link to comment
Share on other sites

Personally, I don't think it's an issue at all if most customers choose unoriginal and repetitive names for their coupons.

 

Keep in mind that each coupon is a separate entity and has no dependency whatsoever on other coupons. That's the key. Each coupon is independent of other coupons, even if a lot of coupons happen to share the same name.

 

Think of it this way: If I customer were (hypothetically speaking) able to change the name of their coupon, the fact that the name of their coupon was changed from "SALE" to whatever would have no effect whatsoever on other customers' coupons named "SALE", right? As such, you're fine with the "redundancy", which is not really redundancy, but a by-product of unoriginal naming.

 

Anyway, that's my two cents.

Link to comment
Share on other sites

I agree with your original assessment, and with HartleySan's concurrence. However, Antonio has a point and it's always best to err on the side of normalization. If a client is almost always going to use one of a handful of values, I would be inclined to create those in a separate table. Then, on the user interface side, you present those existing options as a drop-down menu, or give the client the option of entering a new value. This would be both faster for the admin user and more foolproof.

 

In cases like this, where the database design could reasonably go either way, I use the UI considerations to help make my decision.

Link to comment
Share on other sites

 Share

×
×
  • Create New...