Over the years, bloggers on Language Log and elsewhere have catalogued ways of avoiding taboo and other offensive vocabulary in print. These range from handcrafted strategies, like circumlocution and euphemism, through a variety of substitution techniques, to partially automated avoidance schemes (straightforward blocking of postings and messages containing the offending items, several types of asterisking schemes, and the like).
Here’s an automated substitution scheme reported by Martin R in a comment on my “bad bingo words” posting:
My son used to hang out in a chatroom where bad language was modified automatically. “Fuck” became “hug”, “fucking” became “hugging”.
To which PaddyK replied:
I like the “hug” filter concept! “If you don’t get your hugging donkey over here right now I’ll hugging kiss you!”
Aside from how silly-sounding the hug substitutes are, and the very real possibility that such substitution could simply invest hug with an obscene aura it didn’t have before, this simple example illustrates some of the (well-known) potential complexities in automated filtering (for some related complexities, see the Language Log postings on automated asterisking in iTunes — for instance, this one).
Here’s the problem: if the filtering routine just does substring replacement, then for fucked and fucking you’ll get huged and huging instead of hugged and hugging. So either the routine has to incorporate some spelling conventions of English, or the dictionary for replacement has to have separate entries for all the forms — a solution that’s probably necessary in any case, to avoid absurdities like replacing the turd of Saturday with something else (or using four asterisks, or blocking the message entirely).