I took a crack at it this morning and came up with:

.*\b\d{3}[-. ]?\d{2}[-. ]?\d{4}\b.*

this will match on 123-45-6789, 321.54-9876, 987654321, etc. It matches on word boundaries so a 10 digit or larger number will not match nor will a non-number like "123-45-6789abc" but a phrase like:

his social security number is 123-45-6789.

will match (since the period is considered a word boundary). I suspect matching any 9 digit number is going to cause a lot of false positives. You can get rid of that match and only match on SSNs with dashes, dots or spaces by removing the ?s from the regex:

.*\b\d{3}[-. ]\d{2}[-. ]\d{4}\b.*

Of course, this isn't as complex a regex as the example you started with because RE2 does support back references. Back references are used in the example you started with to say things like "match the first 3 digits of an SSN, but not if all 3 digits are zeroes.

This means that my example is likely to cause more false positives than yours. How many more is difficult to say without putting something in place.

My suggestion is that you do some basic testing with my regex on gpilotdev and once you're comfortable with it, implement a "monitoring" content compliance rule, one that BCCs a mailbox you are watching when outgoing messages match but DOES NOT reject the message or redirect it entirely. That way you can see how many messages are caught and what the false positive rate looks like and why and from there we can work to refine the regex further or possibly add some exception addresses.

Inline image 1Inline image 2