Add get censored words & censor middle only features by emso-c · Pull Request #36 · snguyenthanh/better_profanity

emso-c · 2021-07-15T11:19:58Z

Provided a solution for the issue #34. Sorry I kind of messed up with branches so this commit is merged with the other PR I created (#35).

Again, it doesn't break anything and can only be used if get_censored_words is True

It basically returns a Tuple of (str, list) with the str being the original censored text and the list being the list of censored words.

Usage:

from better_profanity import profanity

if __name__ == "__main__":
    profanity.load_censor_words()
    
    text = "test fucking shit"
    censored_text, censored_words = profanity.censor(text, get_censored_words=True)
    print(censored_words)
    # ['fucking', 'shit']

snguyenthanh

It would be great if you could write some tests for the functions you added as well.

snguyenthanh · 2021-07-20T02:58:12Z

    ## PUBLIC ##

-    def censor(self, text, censor_char="*"):
+    def censor(self, text, censor_char="*", middle_only=False, get_censored_words=False):


I think it would be better to have a separate get_censored_words(text) function, as it is not obviously clear the returned result of censor(get_censored_words=True) is. You could share some/most of the code with censor function.

Oh I thought it would be better to not repeat the same code, but it's true that it would potentially cause confusion. I'll work on it.

That's great. You are right that it would be better to not repeat the same code. But we could also create a function that is used by both censor and get_censored_words to determine which words are profane.

snguyenthanh · 2021-07-20T02:59:59Z


-def get_replacement_for_swear_word(censor_char):
-    return censor_char * 4
+def get_replacement_for_swear_word(censor_char, n=4):


It's just my personal preference: could you replace n with a more detailed variable ? It's quite unclear what n is when we call get_replacement_for_swear_word("-", n=2).

emso-c · 2021-07-20T09:17:46Z

Separated the functions as you said but while writing unit tests and testing edge cases, I just realized that the current _hide_swear_words function merges multiple swear words into one when they're next to each other. Though both functionalities work well for single unseparated swear words, they don't behave well in these kind of situations.

Example with get_censored_words:

bad_text = "Dude, I hate shit. Fuck bullshit."
profanity.get_censored_words(bad_text)
>>>['shit', 'bullshit']
# It completely ignored "Fuck" since they're merged

bad_text = "That wh0re gave m3 a very good H@nD j0b."
profanity.get_censored_words(bad_text)
>>>['wh0re', 'H@nD']
# It didn't include "j0b" since they're separated with space

Example with middle_only (same issues):

bad_text = "Dude, I hate shit. Fuck bullshit."
profanity.censor(bad_text, middle_only=True)
>>>"Dude, I hate s**t b******t."
# It completely ignored "Fuck" since they're merged

bad_text = "That wh0re gave m3 a very good H@nD j0b."
profanity.censor(bad_text, middle_only=True)
>>>"That w***e gave m3 a very good H**D."
# It didn't include "j0b" since they're separated with space

To solve that, I simply put a check before merging swear words (only merge if: not (get_censored_words or middle_only)). The results are better and they pass all other unit tests, but bit of a coverage your method provided has disappeared. Which means it'll detect less swear words and it might result in inconsistencies.

Example with get_censored_words:

bad_text = "Dude, I hate shit. Fuck bullshit."
profanity.get_censored_words(bad_text)
>>>['shit', 'Fuck', 'bullshit']

bad_text = "That wh0re gave m3 a very good H@nD j0b."
profanity.get_censored_words(bad_text)
>>>['wh0re']
# It didn't include "H@nD j0b"

Example with middle_only (same issues):

bad_text = "Dude, I hate shit. Fuck bullshit."
profanity.censor(bad_text, middle_only=True)
>>>"Dude, I hate s**t. F**k b******t."


bad_text = "That wh0re gave m3 a very good H@nD j0b."
profanity.censor(bad_text, middle_only=True)
>>>"That w***e gave m3 a very good H@nD j0b."
# It didn't include "H@nD j0b"

Maybe we should just follow this method and warn users of these possible issues? I think it's a pretty mild edge case anyway, but it's up to you.

emso-c added 2 commits July 15, 2021 12:29

Add 'censor middle only' feature

7c5cd79

Solution to issue snguyenthanh#34

cfe0f97

snguyenthanh requested changes Jul 20, 2021

View reviewed changes

snguyenthanh reviewed Jul 20, 2021

View reviewed changes

snguyenthanh mentioned this pull request Jul 20, 2021

Add 'censor middle only' feature #35

Closed

emso-c changed the title ~~Add get censored words feature~~ Add get censored words & censor middle only features Jul 20, 2021

emso-c added 2 commits July 20, 2021 10:42

Rename parameter

d7df69b

Seperate censor and get_censored_words

5ec1668

emso-c added 3 commits July 20, 2021 12:33

Fix issue with new methods

9afb7c9

Add test for new methods

9cab106

Lint with black

e0c5c11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add get censored words & censor middle only features#36

Add get censored words & censor middle only features#36
emso-c wants to merge 7 commits into
snguyenthanh:masterfrom
emso-c:issue-34

emso-c commented Jul 15, 2021

Uh oh!

snguyenthanh left a comment

Uh oh!

snguyenthanh Jul 20, 2021

Uh oh!

emso-c Jul 20, 2021

Uh oh!

snguyenthanh Jul 20, 2021

Uh oh!

snguyenthanh Jul 20, 2021

Uh oh!

emso-c Jul 20, 2021

Uh oh!

emso-c commented Jul 20, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

emso-c commented Jul 15, 2021

Usage:

Uh oh!

snguyenthanh left a comment

Choose a reason for hiding this comment

Uh oh!

snguyenthanh Jul 20, 2021

Choose a reason for hiding this comment

Uh oh!

emso-c Jul 20, 2021

Choose a reason for hiding this comment

Uh oh!

snguyenthanh Jul 20, 2021

Choose a reason for hiding this comment

Uh oh!

snguyenthanh Jul 20, 2021

Choose a reason for hiding this comment

Uh oh!

emso-c Jul 20, 2021

Choose a reason for hiding this comment

Uh oh!

emso-c commented Jul 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

emso-c commented Jul 20, 2021 •

edited

Loading