Add get censored words & censor middle only features#36
Conversation
snguyenthanh
left a comment
There was a problem hiding this comment.
It would be great if you could write some tests for the functions you added as well.
| ## PUBLIC ## | ||
|
|
||
| def censor(self, text, censor_char="*"): | ||
| def censor(self, text, censor_char="*", middle_only=False, get_censored_words=False): |
There was a problem hiding this comment.
I think it would be better to have a separate get_censored_words(text) function, as it is not obviously clear the returned result of censor(get_censored_words=True) is. You could share some/most of the code with censor function.
There was a problem hiding this comment.
Oh I thought it would be better to not repeat the same code, but it's true that it would potentially cause confusion. I'll work on it.
There was a problem hiding this comment.
That's great. You are right that it would be better to not repeat the same code. But we could also create a function that is used by both censor and get_censored_words to determine which words are profane.
|
|
||
| def get_replacement_for_swear_word(censor_char): | ||
| return censor_char * 4 | ||
| def get_replacement_for_swear_word(censor_char, n=4): |
There was a problem hiding this comment.
It's just my personal preference: could you replace n with a more detailed variable ? It's quite unclear what n is when we call get_replacement_for_swear_word("-", n=2).
|
Separated the functions as you said but while writing unit tests and testing edge cases, I just realized that the current Example with get_censored_words: bad_text = "Dude, I hate shit. Fuck bullshit."
profanity.get_censored_words(bad_text)
>>>['shit', 'bullshit']
# It completely ignored "Fuck" since they're merged
bad_text = "That wh0re gave m3 a very good H@nD j0b."
profanity.get_censored_words(bad_text)
>>>['wh0re', 'H@nD']
# It didn't include "j0b" since they're separated with spaceExample with middle_only (same issues): bad_text = "Dude, I hate shit. Fuck bullshit."
profanity.censor(bad_text, middle_only=True)
>>>"Dude, I hate s**t b******t."
# It completely ignored "Fuck" since they're merged
bad_text = "That wh0re gave m3 a very good H@nD j0b."
profanity.censor(bad_text, middle_only=True)
>>>"That w***e gave m3 a very good H**D."
# It didn't include "j0b" since they're separated with spaceTo solve that, I simply put a check before merging swear words (only merge if: Example with get_censored_words: bad_text = "Dude, I hate shit. Fuck bullshit."
profanity.get_censored_words(bad_text)
>>>['shit', 'Fuck', 'bullshit']
bad_text = "That wh0re gave m3 a very good H@nD j0b."
profanity.get_censored_words(bad_text)
>>>['wh0re']
# It didn't include "H@nD j0b"Example with middle_only (same issues): bad_text = "Dude, I hate shit. Fuck bullshit."
profanity.censor(bad_text, middle_only=True)
>>>"Dude, I hate s**t. F**k b******t."
bad_text = "That wh0re gave m3 a very good H@nD j0b."
profanity.censor(bad_text, middle_only=True)
>>>"That w***e gave m3 a very good H@nD j0b."
# It didn't include "H@nD j0b"Maybe we should just follow this method and warn users of these possible issues? I think it's a pretty mild edge case anyway, but it's up to you. |
Provided a solution for the issue #34. Sorry I kind of messed up with branches so this commit is merged with the other PR I created (#35).
Again, it doesn't break anything and can only be used if
get_censored_wordsisTrueIt basically returns a Tuple of
(str, list)with thestrbeing the original censored text and thelistbeing the list of censored words.Usage: