ICode9

精准搜索请尝试: 精确搜索
首页 > 编程语言> 文章详细

python – 用“SAD”或“HAPPY”替换表情符号的代码不能正常工作

2019-08-24 22:55:56  阅读:303  来源: 互联网

标签:python nltk text-processing


所以我想用“HAPPY”代替所有快乐的表情符号,反之亦然“SAD”用于文本文件的悲伤表情符号.但代码不能正常工作.虽然它检测到表情符号(截至目前:-)),但在下面的例子中,它没有用文本替换表情符号,它只是附加文本,并且由于我似乎无法理解的原因,它也会附加两次.

dict_sad={":-(":"SAD", ":(":"SAD", ":-|":"SAD",  ";-(":"SAD", ";-<":"SAD", "|-{":"SAD"}
dict_happy={":-)":"HAPPY",":)":"HAPPY", ":o)":"HAPPY",":-}":"HAPPY",";-}":"HAPPY",":->":"HAPPY",";-)":"HAPPY"}

#THE INPUT TEXT#
a="guys beautifully done :-)" 

for i in a.split():
    for j in dict_happy.keys():
        if set(j).issubset(set(i)):
            print "HAPPY"
            continue
    for k in dict_sad.keys():
        if set(k).issubset(set(i)):
            print "SAD"
            continue
    if str(i)==i.decode('utf-8','replace'):
       print i

输入文本

a="guys beautifully done :-)"              

输出(“HAPPY”即将出现两次,表情符号也不会消失)

guys
-
beautifully
done
HAPPY
HAPPY
:-)

预期输出

guys
beautifully
done
HAPPY

解决方法:

你将每个单词和每个表情符号都转换为一组;这意味着您正在寻找单个字符的重叠.您可能希望最多使用完全匹配:

for i in a.split():
    for j in dict_happy:
        if j == i:
            print "HAPPY"
            continue
    for k in dict_sad:
        if k == i:
            print "SAD"
            continue

您可以直接迭代字典,无需在那里调用.keys().您实际上似乎没有使用字典值;你可以这样做:

for word in a.split():
    if word in dict_happy:
        print "HAPPY"
    if word in dict_sad:
        print "SAD"

然后可能使用集而不是字典.这可以简化为:

words = set(a.split())
if dict_happy.viewkeys() & words:
    print "HAPPY"
if dict_sad.viewkeys() & words:
    print "SAD"

使用键上的dictionary view作为一组.尽管如此,使用套装仍然会更好:

sad_emoticons = {":-(", ":(", ":-|", ";-(", ";-<", "|-{"}
happy_emoticons = {":-)", ":)", ":o)", ":-}", ";-}", ":->", ";-)"}

words = set(a.split())
if sad_emoticons & words:
    print "HAPPY"
if happy_emoticons & words:
    print "SAD"

如果您想从文本中删除表情符号,则必须过滤单词:

for word in a.split():
    if word in dict_happy:
        print "HAPPY"
    elif word in dict_sad:
        print "SAD"
    else:
        print word

或者更好的是,结合两个词典并使用dict.get():

emoticons = {
    ":-(": "SAD", ":(": "SAD", ":-|": "SAD", 
    ";-(": "SAD", ";-<": "SAD", "|-{": "SAD",
    ":-)": "HAPPY",":)": "HAPPY", ":o)": "HAPPY",
    ":-}": "HAPPY", ";-}": "HAPPY", ":->": "HAPPY",
    ";-)": "HAPPY"
}

for word in a.split():
    print emoticons.get(word, word)

在这里,我将当前单词作为查找键和默认值传递;如果当前单词不是表情符号,则打印单词本身,否则打印单词SAD或HAPPY.

标签:python,nltk,text-processing
来源: https://codeday.me/bug/20190824/1712821.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有