Jump to content
Larry Ullman's Book Forums

Mysql Group By, Where In, Having Clause Trouble


Recommended Posts

So nice to be able to get rid of my old forum name :) but I'm having some MySQL problems and I really can't figure out - I keep thinking I must be missing something simple.

 

I have a set of (37) fully normalised tables to store the data with integrity - these normalised tables drive a non-normalised search table that I conduct full text queries on. Fulltext searches from this database are extracted using INSERT INTO... SELECT to place them into a temporary table. The temporary table is then used to extract the rows required for pagination based on an INNER JOIN to the relevant tables and to calculate the number of total results for pagination. (COUNT(*) was not used for this because I have to extract all the IDs into the temporary table anyway so MYSQLI_AFFECTED_ROWS seemed like a more efficient solution).

 

The reason I have to store them in a temporary table is because we then have to run a series of COUNT queries to determine which categories the results fall into and present these back with a series of checkboxes. The user can then select multiple sub-categories from the sidebar to narrow down their search. At this point I think an example is needed:

 

  1. The user does a generic search for England
  2. Search returns 330 results with a series of refinement checkboxes for habitat types: (Grassland - 48, Shrubland - 45, Wetlands - 33)
  3. User can make one or multiple selections from the checkboxes

 

If they make just one selection a WHERE clause similar to: habitat_type_id = 4 is inserted into the query - which works fine. But if the user makes two selections (either at the same time or after the first filter has been added) then it doesn't work. The query becomes eg. habitat_type_id = 4 AND habitat_type_id = 11. This won't return anything because the row being examined will only ever have one value in. Next I tried a WHERE... IN clause but that returns documents that have a habitat_type_id of 4 or 11 not both.

 

I thought maybe aliasing the columns was the way to go but I've had no luck - I've also tried a rather hacky implementation of GROUP_CONCAT and HAVING to try and extract them. Seems there must be a simpler technique/query!?

 

I think that's enough information but if you need to know anything else just ask - I've included the query which breaks below:

 

RETURNS: Nothing

 

INSERT IGNORE INTO tmp_references SELECT reference_id, ( (1.75 * (MATCH(reference_title) AGAINST ('england' IN BOOLEAN MODE))) + (1 * (MATCH(reference_text) AGAINST ('england' IN BOOLEAN MODE))) + (0.75 * (MATCH(reference_tags) AGAINST ('england' IN BOOLEAN MODE))) + (1 * (MATCH(reference_authors) AGAINST ('england' IN BOOLEAN MODE))) + (2 * (MATCH(reference_keywords) AGAINST ('england' IN BOOLEAN MODE))) ) AS relevance FROM nn_references LEFT JOIN habitats_references USING (reference_id) WHERE (MATCH (reference_title, reference_text, reference_tags, reference_authors, reference_keywords) AGAINST ('england' IN BOOLEAN MODE)) AND habitat_type_id = 4 AND habitat_type_id = 11 GROUP BY reference_id HAVING relevance > 0 ORDER BY relevance DESC

 

RETURNS: Documents matching either not both

 

INSERT IGNORE INTO tmp_references SELECT reference_id, ( (1.75 * (MATCH(reference_title) AGAINST ('england' IN BOOLEAN MODE))) + (1 * (MATCH(reference_text) AGAINST ('england' IN BOOLEAN MODE))) + (0.75 * (MATCH(reference_tags) AGAINST ('england' IN BOOLEAN MODE))) + (1 * (MATCH(reference_authors) AGAINST ('england' IN BOOLEAN MODE))) + (2 * (MATCH(reference_keywords) AGAINST ('england' IN BOOLEAN MODE))) ) AS relevance FROM nn_references LEFT JOIN habitats_references USING (reference_id) WHERE (MATCH (reference_title, reference_text, reference_tags, reference_authors, reference_keywords) AGAINST ('england' IN BOOLEAN MODE)) AND habitat_type_id IN (4,11) GROUP BY reference_id HAVING relevance > 0 ORDER BY relevance DESC

 

Thank you in advance guys

 

PS Execution speed is of the essence with this project :s not sure if that makes any difference to the solution.

Link to comment
Share on other sites

Well I've managed to come up with a solution that kind of works - using the result of GROUP_CONCAT CAST to CHAR in the HAVING clause doesn't work because they become a string not a set of integers. The solution I've got is to run a WHERE... IN clause to get rows that match either category and then SELECT COUNT(DISTINCT habitat_type_id) AS habitats and add a HAVING clause like HAVING habitats = 2 (with 2 being the number of arguments passed in the WHERE... IN clause.

 

So the query looks something like this:

 

SELECT reference_id, COUNT(DISTINCT habitat_type_id) AS habitats, ( (0.2 * (MATCH(reference_title) AGAINST ('england' IN BOOLEAN MODE))) + (0.25 * (MATCH(reference_text) AGAINST ('england' IN BOOLEAN MODE))) + (0.05 * (MATCH(reference_tags) AGAINST ('england' IN BOOLEAN MODE))) + (0.3 * (MATCH(reference_authors) AGAINST ('england' IN BOOLEAN MODE))) + (0.2 * (MATCH(reference_keywords) AGAINST ('england' IN BOOLEAN MODE))) ) AS relevance FROM nn_references LEFT JOIN habitats_references USING (reference_id) WHERE (MATCH (reference_title, reference_text, reference_tags, reference_authors, reference_keywords) AGAINST ('england' IN BOOLEAN MODE)) AND habitat_type_id IN (4,11) GROUP BY reference_id HAVING relevance > 0 AND habitats = 2 ORDER BY relevance DESC

 

I also since found a solution with JOINS in a post on intersection that seems to work. The only problem is that in theory a user can make a OR selection too:

 

e.g. Initially choose a habitat type of foo OR bar - then once they start to refine the results using the checkboxes these become (foo AND baz) OR (bar AND baz)

 

Add another refinement and in theory it would become:

 

(foo AND baz AND qux) OR (bar AND baz AND qux)

 

Am I right in thinking that now I'd have to essentially query twice and merge the result sets? Would I do this in MySQL or server side in PHP e.g. Write into an array using the document ID as the array key?

 

Thank you for any help/guidance you can give

Link to comment
Share on other sites

Hi Larry,

 

That's fine I realised it wasn't the most straight forward thing after a few hours racking my brains and hours spent googling with no real luck - we're launching the website with just AND selections for now and will build this functionality in as soon as we get a solution. Thank you for your help with this.

Link to comment
Share on other sites

 Share

×
×
  • Create New...