Search on many-many relations (e.g. tags)

Let’s say I have a many-many relation between two DataObjects: Bookmark and Category. Is there a way to query for something link “all bookmars in category a and b or in category c and dwithout iterating over all bookmars?

All I found is an old article by Philipp Keller but I’m looking for a ready to use solution for SilverStripe (if it exists) before diving into raw SQL queries.

Hey @ntd

I don’t think the ORM let’s us do quite that out-of-the-box, but it can be used to get you what you’re after without raw SQL :slight_smile:

In SS4, I’d setup the many-many relationship to use the many_many_through functionality, so that you’ve got a relationship mapping object that’s accessible to the ORM like any other DataObject (rather than pissing about with many_many_extraFields).

Let’s say that the many_many_through relationship object is called BookmarkCategoryMap and we keep it simple so its fields are basically ID, BookmarkID and CategoryID.

You could then:

// get the IDs of bookmarks per category (n.b. i've just made up category ID values)
$bookmarkIDsCatA = BookmarkCategoryMap::get()->filter('CategoryID', 1)->columnUnique('BookmarkID');
$bookmarkIDsCatB = BookmarkCategoryMap::get()->filter('CategoryID', 2)->columnUnique('BookmarkID');
$bookmarkIDsCatC = BookmarkCategoryMap::get()->filter('CategoryID', 3)->columnUnique('BookmarkID');
$bookmarkIDsCatD = BookmarkCategoryMap::get()->filter('CategoryID', 4)->columnUnique('BookmarkID');

// create an array of bookmark IDs that are in both categories, per a/b and c/d group
$bookmarkIDsCatsAandB = array_intersect_assoc($bookmarkIDsCatA, $bookmarkIDsCatB);
$bookmarkIDsCatsCandD = array_intersect_assoc($bookmarkIDsCatC, $bookmarkIDsCatD);

// in those two arrays you have the list of IDs of bookmarks you want to return, so now;

// merge those two arrays
$bookmarkIDsCatsAandBorCandD = array_merge($bookmarkIDsCatsAandB, $bookmarkIDsCatsCandD);

// return to the database for a quick ID-based lookup of your bookmark table
$bookmarks = Bookmark::get()->filter('ID', $bookmarkIDsCatsAandBorCandD);

Should be workable, have implemented something similar in spirit recently and has been working fine. Rather than getting and then looping over a heap of objects, use the ORM to build your base lookups but then just grab the IDs from the results via ->columnUnique(). Use PHP’s array functions to prune them to your specific conditions, and then use the resulting array of IDs to perform the lookup on the DataObject you’re querying for.

I hope that helps!

1 Like

I’m not sure your solution will scale (especially on the memory side) but it is certainly a clever usage of SQL and PHP interaction: many thanks for sharing.

To query for this you can query with dot notation in the filter column.

An example would be -

class Bookmark extends DataObject{
	private static $many_many = [
		'Categories' => Category::class
	];
}

class Category extends DataObject{
	private static $db = [
		'Title' => 'Varchar(255)'
	];
	
	private static $belongs_many_many = [
		'Bookmarks' => Bookmark::class
	];
}

.....


Bookmark::get()->filter(['Categories.ID' => [1,2]])

or

Bookmark::get()->filter(['Categories.Title' => ['Category A', 'Category B']])

Yes, I’am aware of that: your query returns bookmarks in category 1 OR 2.

My problem is with intersections, e.g. how do you return bookmarks in category 1 AND 2 with the ORM? More generally, how do you return generic AND/OR combinations, e.g. bookmarks in category (1 OR 2) AND (3 OR 4)?

In plain SQL I would write the latter with subqueries, e.g. something along these lines:

SELECT *
    FROM Bookmark B LEFT JOIN BookmarkCategory BC
    ON B.ID = BC.BookmarkID
WHERE BC.CategoryID IN (1, 2) AND B.ID IN (
    SELECT B2.ID
        FROM Bookmark B2 LEFT JOIN BookmarkCategory BC2
        ON B2.ID = BC2.BookmarkID
    WHERE BC2.CategoryID IN (3, 4)
);

Being it a quite common feature, I was hoping there was something already done, such as a module or something.

I believe if you add an additional filter it will give you the ‘and’ statement.

Bookmark::get()->filter(['Categories.ID' => 1])->filter(['Categories.ID'=>2]);

You could also try a not exclude so -

Bookmark::get()->exclude(['Categories.ID:not' => [1,2]]);
1 Like

I believe if you add an additional filter it will give you the ‘and’ statement.

I think you are right. Hence (1 OR 2) AND (3 OR 4) could be rewritten as

Bookmark::get()
    ->filter('Categories.ID' => [ 1, 2 ])
    ->filter('Categories.ID' => [ 3, 4 ]);

This would probably solve all my problems, and in an elegant way! It seems so logical I don’t understand why it did not pop out before. I will report if everything works as expected.

After testing

Unfortunately it does not work: the result of the query is always an empty set. See post #13 for technical details.

This is 3 queries instead of one, and I’m not even sure it will work. Closest I can think of with just filters.

$abIDs = Bookmark::get()->filter([
    'Categories.Title' => 'A',
    'Categories.Title' => 'B',
])->column('ID');

$cdIDs = Bookmark::get()->filter([
    'Categories.Title' => 'C',
    'Categories.Title' => 'D',
])->column('ID');

$bookmarks = Bookmark::get()->byIDs(array_unique(array_merge($abIDs, $cdIDs)));
1 Like

Yes, this is pretty similar to what @dizzystuff suggested, but leveraging the query instead of doing an array_merge.

Using columnUnique('ID') instead of column('ID') will avoid duplicates from the start :wink:

I think column(‘ID’) would already give you unique IDs as each Bookmark would only appear once. The array_unique is in case there are common IDs between the two separate arrays.

For some reason I saw your code similar to the other one… but it is not.

The filter call does not work because there is a duplicate array key. Surprisingly PHP does not error out but it silently runs your code applying only the second condition ('Categories.Title' => 'B'). Using two filter calls did not solve the issue either: it returns an empty set :cold_sweat:

Here is my test code:

use SilverStripe\ORM\DataObject;

class Category extends DataObject
{
    private static $db = [
        'Title' => 'Varchar',
    ];
    private static $many_many = [
        'Bookmarks' => Bookmark::class,
    ];

    public function requireDefaultRecords()
    {
        parent::requireDefaultRecords();
        foreach (['A','B','C','D'] as $id => $title) {
            $row = Category::create();
            $row->ID = $id + 1;
            $row->Title = $title;
            $row->write();
        }
    }
}

class Bookmark extends DataObject
{
    private static $db = [
        'Title' => 'Varchar',
    ];
    private static $belongs_many_many = [
        'Categories' => Category::class,
    ];

    public function requireDefaultRecords()
    {
        parent::requireDefaultRecords();
        foreach (['A','AB','ABC','ABCD','BCD','CD','D'] as $title) {
            $row = Bookmark::create();
            $row->Title = $title;
            $row->write();
            foreach (str_split($title) as $ch) {
                $row->Categories()->add(ord($ch) - ord('A') + 1);
            }
        }
    }
}

And here are my results.

// Ok: [ A, AB, ABC, ABCD ]
Bookmark::get()->filter('Categories.Title', 'A')->column('Title');

// Ok: [ AB, ABC, ABCD, BCD ]
Bookmark::get()->filter('Categories.Title', 'B')->column('Title');

// Error: []
Bookmark::get()->filter('Categories.Title', 'A')->filter('Categories.Title', 'B')->column('Title');

// Error: []
Bookmark::get()->filter([ 'Categories.Title' => 'A', 'Categories.ID' => 2 ])->column('Title');

// Ok but with duplicates: [ A, AB, ABC, ABCD, AB, ABC, ABCD, BCD ]
Bookmark::get()->filter('Categories.Title', ['A', 'B'])->column('Title');

// Ok: [ A, AB, ABC, ABCD, BCD ]
Bookmark::get()->filter('Categories.Title', ['A', 'B'])->columnUnique('Title');

// Error: []
Bookmark::get()->filter('Categories.Title', ['A', 'B'])->filter('Categories.Title', ['C', 'D'])->column('Title');

There is something weird with many-many queries or I fail to see something big.

Is that what you want though? From your original post I though you were wanting (A AND B) OR (C AND D). Even if you changed this to (A OR C) AND (B OR D) it wouldn’t be equivalent as this would also capture combinations A+D and C+B.

1 Like

Sorry about that: I surely need (A OR B) AND (C OR D) and I’ll likely need (A AND B) OR (C AND D).

In the latter case I’m more than happy to do A AND B and C AND D separately and merge the results (as you did in your solution) as the results of the subqueries are all valid items anyway.

I’m guessing those ones that give you an empty result set are working (not erroring out) but they’re just not working as you hope. Here I think the filters are being merged to find Categories where Title = A AND Title = B. That’s impossible, hence the empty result set. :frowning:

We need an ORM expert. @TractorCow? :grinning: