#7 ✓resolved
Georg Ledermann

Duplicate rows in result set

Reported by Georg Ledermann | October 14th, 2008 @ 07:27 AM

Given this statement

User.find :all,
          :conditions => { :any => true,
                           :name_contains => 'foo',
                           :orders => { :title_contains => 'foo' } }

the following SQL query is generated:

SELECT users.*
FROM users
LEFT OUTER JOIN orders ON (orders.user_id = users.id)
WHERE users.name LIKE '%foo%'
OR orders.title LIKE '%foo%'

But this is not correct, because if a user has more than one matching orders, they are all included in the result set, which gets too large.

In my opinion a "DISTINCT" is needed, so the query has to be:

SELECT DISTINCT users.*
FROM users
LEFT JOIN ...

BTW, of course you can do it the explicit way:

User.find :all,
          :select => 'DISTINCT users.*',
          :conditions => ...

But it would be better to do this implicit within searchgasm, because the DISTINCT slows down the query and should only by made if needed (e.g. if there are LEFT JOINS generated)

Comments and changes to this ticket

  • Ben Johnson

    Ben Johnson October 14th, 2008 @ 12:07 PM

    • State changed from “new” to “open”

    Right, I had to make a change a few weeks ago. What I was doing before was adding in "group by". PostgreSQL was complaining about the group by statement because it wanted all of the columns in the group by statement. It was really strange.

    Anyways, that's when I realized searchgasm really shouldn't modify your query in that respect.

    Here's my general rule of thumb:

    all, find, first, count, etc is all AR's domain

    new_search and calling methods off of the search object is my domain

    That being said, I don't want to change default behavior in AR's domain. But I can "enhance" it. As long as it doesn't change the default behavior, I am fine. What I was doing before was adding in a "group by" statement. This is changing the default behavior. So is adding in distinct.

    If you didn't have searchgasm installed you would have to issue a group or distinct explicitly. Searchgasm shouldn't change that. So this issue is really an issue that AR should be handling.

    My "test" for issues like this is to act as if searchgasm wasn't installed, and then someone installed searchgasm. It should not change how the pre-existing queries acted. Take this example:

    User.all(:joins => :orders, :conditions => {:orders => {:total => 100}})
    

    They would get duplicate results. Maybe they want duplicate results. Regardless, that's not for me to decide. If you installed searchgasm and it changed how that search acted, it could cause problems.

    See where I'm coming from here?

    But when you do the EXACT same search with new_search I remove duplicates. Because that is my domain and I know anyone issuing that command is trying to get uniq results. No one was using new_search before searchgasm was released.

    So to sum all of this up, when playing around in the default AR domain (all, find, first, count, etc) you need to either issue a group by or a distinct explicitly, just like you would without searchgasm.

    What do you think?

  • Georg Ledermann

    Georg Ledermann October 14th, 2008 @ 04:36 PM

    Ben, I agree with you about the different domains of AR and searchgasm. You are right, changing the query in behind may confuse the user.

    Regarding your example:

    User.all( :joins => :orders, :conditions => { :orders => { :total => 100 } })
    

    Here, the user defines a condition based on orders, not more. He didn't explicity add any join, so in my opinion he expects a uniqe result set. Searchgasm adds the join which causes duplicate results, so it should add a DISTINCT, too.

    So, what about this rule of thumb: If searchgasm adds a join which was not already there, a DISTINCT should be added, too.

    Regarding DISTINCT vs. GROUP BY: I prefer using DISTINCT, because GROUP BY should be used for aggregation, IMHO. Isn't DISTINCT more ANSI-SQL, too? In addition, using DISTINCT gives better compatiblity with the plugin "will_paginate" (which I use in conjunction with searchgasm), see this commit: http://github.com/mislav/will_pa...

    For MySQL there is some docu about this: http://dev.mysql.com/doc/refman/...

  • Georg Ledermann

    Georg Ledermann October 14th, 2008 @ 04:41 PM

    Oops, where is the "edit" button in Lighthouse? Sorry for confusion, the example is wrong.

    User.all( :joins => :orders, :conditions => { :orders => { :total => 100 } })
    

    This query should NOT be changed, no DISTINCT should be added because the user has defined the JOIN, so he expects duplicate results.

    But take this example:

    User.all( :conditions => { :orders => { :total => 100 } })
    

    Here no JOIN is defined, so searchgasm has to add it. And for this case, the DISCTINT should be added, too.

  • Ben Johnson

    Ben Johnson October 14th, 2008 @ 05:54 PM

    You make some good points and I agree. Let me see what I can do to get this added properly. I feel like the whole "auto joins" madness is searchgasm is getting a little out of control. I want to clean this up and simplify it, and in the process it might be easier to add something like this in.

  • Ben Johnson

    Ben Johnson October 15th, 2008 @ 04:46 AM

    Alright, maybe you can help me out with this. I can't modify the select option because AR ignores it when you provide the include option.

    User.all(:include => "orders", :conditions => "orders.total = 100")
    

    will result in:

    SELECT "users"."id" AS t0_r0, "users"."created_at" AS t0_r1, "users"."updated_at" AS t0_r2, "users"."account_id" AS t0_r3, "users"."parent_id" AS t0_r4, "users"."first_name" AS t0_r5, "users"."last_name" AS t0_r6, "users"."active" AS t0_r7, "users"."bio" AS t0_r8, "orders"."id" AS t1_r0, "orders"."created_at" AS t1_r1, "orders"."updated_at" AS t1_r2, "orders"."user_id" AS t1_r3, "orders"."total" AS t1_r4, "orders"."description" AS t1_r5, "orders"."receipt" AS t1_r6 FROM "users"  LEFT OUTER JOIN "orders" ON orders.user_id = users.id     WHERE (orders.total = 100)
    

    See what I mean? I'm not even sure how I would wrap a distinct around that without digging into AR. I don't really mind digging into AR either, as long as it isn't complicated and doesn't make searchgasm brittle.

    So if we take the other route, which is group by, PostgresSQL gives me all kinds of errors. It's a little rediculous how it works. But if you do:

    select users.* from users
    

    The following will give you an error:

    select users.* from users group by users.id
    

    No idea why, but it does. I think it wants each of the users columns or something along those lines. I gave up on it because I felt like adding group by was dirty anyways.

    Lastly, I looked at the commit for will_paginate. The code was MySQL specific. Each database quotes its table names and columns differently. Most people don't quote when they are writing a query by hand, but that could be a problem for it. Also, that code is for count queries, not searching.

    Anyways, do you have any ideas on how to conquer this?

  • Georg Ledermann

    Georg Ledermann October 15th, 2008 @ 10:33 AM

    Seems to be there are lots of problems. I agree with you that the "GROUP BY" way is bad, because some databases don't allow it, I think because of missing aggregations in the select. So far as I remember, FirebirdSQL doesn't allow this, too.

    BTW, combining :include and :joins may conflict with the current version of searchgasm. Try this:

    User.all(:include => "orders", :conditions => { :orders => { :total => 100 }}
    
    
    => ActiveRecord::StatementInvalid: Mysql::Error: Not unique table/alias

    In summary, what about this:

    • If the user gives an :include option, no uniq result set is possible. Sad, but acceptable. Indeed, searchgasm should not add additional join for the conditions, but use the existing include (but perhaps this is another story)

    • If there is no :include option given and searchgasm has added JOINS, the :select option should be set to "DISTINCT tablename.*', which should be possible.

  • Ben Johnson

    Ben Johnson October 15th, 2008 @ 12:19 PM

    I think those are reasonable rules. What about this.

    If an include is provide, then remove any conflicting "auto" joins that searchgasm adds. This way its a little more flexible. But heres the big problem that I have been racking my brain over for hours. Take this example:

    User.all(:joins => {:orders => :line_item}, :include => :orders)
    

    Those conflict. The only way to properly solve this is move the entire joins statement over to the :include. Now I am preloading all line_items too, which could makes things very slow. I don't like the idea of preloading things by surprise.

    So here is my solution, and I'm trying to find an elegant way to do this. I want to alter ActiveRecord, so that it wont add duplicate joins when it comes to includes. So if an identical join was already added, still preload the objects, but don't include the join sql into the query. Now I don't want to do anything on the searchgasm side.

  • Ben Johnson

    Ben Johnson October 15th, 2008 @ 01:52 PM

    You're all set. The new version solves the :include conflict. I have not tackled the unique records yet. I will do so now.

  • Ben Johnson

    Ben Johnson October 15th, 2008 @ 03:14 PM

    I'm also fairly certain the if includes are provided you get unique results anyways. So this should work out.

  • Georg Ledermann

    Georg Ledermann October 15th, 2008 @ 04:43 PM

    Great, it all works fine and it seems to be a very nice solution. Thank you for your hard work!

  • Ben Johnson

    Ben Johnson October 15th, 2008 @ 05:27 PM

    • State changed from “open” to “resolved”

    Cool, this is all set

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

Provides common named scopes and object based searching.

People watching this ticket

Referenced by

Pages