How to get clean up broken [sitetree_link,id=] links

We have a client on Silverstripe 3.6 who’s website has multiple broken links in the following format: [sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=][sitetree_link,id=]

Is this a common problem, does anyone know what causes it, and is there anyway to easily clean up these links across the whole CMS/ website?

I used this code (with a verbose output & error checking). There’s probably a cleaner way of doing it, but this worked fine:

 $affectedPages = Versioned::get_by_stage('SiteTree', 'Live')->filter(['Content:PartialMatch' => '[sitetree_link,id=]']);

      foreach($affectedPages as $page){
          $content = $page->Content;
          $content = str_replace('[sitetree_link,id=]','',$content);
          $page->Content = $content;
          $page->writeToStage('Live');
      }

I don’t know about clean up, but it’s caused by pages being deleted, or not published, when they’re linked to I think.

All I can think of is the broken link report, to at least know which pages have the problem

You can strip out empty links in onBeforeWrite() with a regex that searches for a links with empty ids.

If it’s not many sites you can clean it manually using the broken links report, like @firesphere suggested.

If the broken link report works you could do that then loop over the result set with some regex like @wmk suggested. Save changes and do a publish if the page was published before.

If the broken link report doesn’t give you results, (this is probably not a good idea but if it was me) I would consider exporting the 3 SiteTree tables as SQL, then clean up those files with some regex find and replace and re-import. You’d want to be very careful, so like I said… probably not a good idea. I’ve done this before when a whole bunch of links had to be updated and re-writing history wasn’t an issue.