How we handle code coverage and its CI checks when they fail

Hi all,

During the upgrade to SilverStripe 4 we adopted across the board to measure code coverage in our modules and provide CI status checks in GitHub letting us know when they change.

The default criteria for is (from what I can tell) an algorithmic based policy where:

  • overall project coverage shouldn’t decrease at all
  • the coverage for the patch you contribute should have at least the overall project coverage covered in it

If either of these conditions fail, the CI checks fail.

Generally speaking we always ignore these checks and merge anyway. This means that the overall “status checks” have often failed for PRs and commits in GitHub, even if Travis (which is what we really care about) is passing.

I see two solutions to this:

  1. We can adjust the default configuration to be more lenient on coverage changes: Codecov. This can be done globally for silverstripe repositories by using “organisation configuration” in the web UI.
  2. Remove the status checks altogether.

For option 1, we would need to decide what is the absolute baseline we’d expect before we actively say “no, this pull request doesn’t have enough test coverage for us to merge it yet” (in nicer words!). Perhaps an overall reduction of 10% coverage for the project would be OK, while allowing 0% coverage for the patch? I suggest this because sometimes private static changes will show up as 0% coverage, or fixing a typo somewhere.

Benefits for option 1: still surface code coverage metrics on every PR, configuration can be changed once for everything.

For option 2, we could still run coverage in a way like we do with framework now where it’s on a cron schedule only, but doesn’t run on every PR.

Benefits for option 2: Travis builds generally will be faster (since we’re not running coverage every time). Drawback: we have to change .travis.yml files in every repo, over time.

It’d be cool to get some thoughts on either option here!

1 Like

One example where this came up: Commit ⋅ silverstripe/silverstripe-versioned-admin

It adds a lot of PHP code to a new module which doesn’t have a lot of existing code. A lot of it is pretty straightforward form field stuff, and IMHO is low value to add coverage for. Some bits like doRestore() would be good to test, but even there the actual logic is already unit tested in core through RestoreAction::restore(). For this change, I would’ve rather have effort being put into end to end testing, which won’t show up in code coverage stats at all but is of higher value.

So in short, this is really hard to generalise. Maybe “option 3” could be: Leave checks as-is, but require either the pull request author or merger to state why they chose to ignore coverage failures in this case? We could add this to our “core committer house rules”?

Also, can we start having forum discussions on #general unless they are something we really don’t want the community to see or participate in? For example, Luke as the author of the commit above won’t be able to follow our reasoning here.

It adds a lot of PHP code to a new module which doesn’t have a lot of existing code

Sure, but the code that was there was well covered.

Side note: I’ve moved the thread into General :slight_smile:

As long as we take the coverage into account, even if we say “I’m merging this despite coverage failing due to x” then at least it provides some value. But just ignoring it is not cool.

If we see lots of failures that are dismissed because the value is too small to be meaningful, then we can update our failure threshold.

Yeah that sounds fair.

Another consideration I forgot to mention is whether we should start including React test coverage in these metrics, given that an increasing amount of our core code is not included in the metric.

@flamerohr mentioned a while back that he’s had a personal project where he’s merged PHP and Jest test coverage reports for

But just ignoring it is not cool.

I think we should set the codecov settings to the point where it would be “not cool” - and then everybody can know not to ignore it. Even if the codecov checks are green, you can still see the amount that coverage has been reduced and say something like “Although your codecov checks are green there is some highly functional code here and I think you should write some more tests”

I don’t think I merged - so much as just upload twice :slight_smile:

Right, so just handles that itself?