ICode9

精准搜索请尝试: 精确搜索
首页 > 编程语言> 文章详细

python 包之 PyQuery 网页解析教程

2022-04-22 10:02:56  阅读:194  来源: 互联网

标签:pq PyQuery 包之 python doc html print pyquery


一、安装

  • 是一个非常强大又灵活的网页解析库

  • PyQuery 是 Python 仿照 jQuery 的严格实现

  • 语法与 jQuery 几乎完全相同,更多操作可以参考jQuery

pip install pyquery

 

二、字符串初始化

html = '''
<ul id="container">
    <li class="wow fadeIn">
        <div class="d-flex latest-small-thumb">
            <div class="post-thumb d-flex mr-15 border-radius-10 img-hover-scale overflow-hidden">
                <a class="color-white" href="single.html" tabindex="0">
                    <img src="assets/imgs/news/thumb-11.jpg" alt="">
                </a>
            </div>
            <div class="post-content media-body align-self-center">
                <h5 class="post-title mb-15 text-limit-3-row font-medium">
                    <a href="single.html" tabindex="0">9 Things I Love About Shaving My Head During Quarantine</a>
                </h5>
            </div>
        </div>
    </li>
</ul>
'''

from pyquery import PyQuery as pq

doc = pq(html)
print(doc)
print(type(doc))
print(doc('li'))

 

三、url初始化

from pyquery import PyQuery as pq

doc = pq(url="http://www.baidu.com", encoding='utf-8')
print(doc('head')

 

四、文件初始化

from pyquery import PyQuery as pq

doc = pq(filename='index.html')
print(doc)

 

五、css选择器

html = '''
<ul id="container">
    <li class="wow fadeIn">
        <div class="d-flex latest-small-thumb">
            <div class="post-thumb d-flex mr-15 border-radius-10 img-hover-scale overflow-hidden">
                <a class="color-white" href="single.html" tabindex="0">
                    <img src="assets/imgs/news/thumb-11.jpg" alt="">
                </a>
            </div>
            <div class="post-content media-body align-self-center">
                <h5 class="post-title mb-15 text-limit-3-row font-medium">
                    <a href="single.html" tabindex="0">9 Things I Love About Shaving My Head During Quarantine</a>
                </h5>
            </div>
        </div>
    </li>
</ul>
'''

from pyquery import PyQuery as pq

doc = pq(html)
print(doc('#container .fadeIn'))

 

六、查找子元素

html = '''
<ul id="container">
    <li class="wow fadeIn">
        <div class="d-flex latest-small-thumb">
            <div class="post-thumb d-flex mr-15 border-radius-10 img-hover-scale overflow-hidden">
                <a class="color-white" href="single.html" tabindex="0">
                    <img src="assets/imgs/news/thumb-11.jpg" alt="">
                </a>
            </div>
            <div class="post-content media-body align-self-center">
                <h5 class="post-title mb-15 text-limit-3-row font-medium">
                    <a href="single.html" tabindex="0">9 Things I Love About Shaving My Head During Quarantine</a>
                </h5>
            </div>
        </div>
    </li>
</ul>
'''

from pyquery import PyQuery as pq

doc = pq(html)
items = doc('#container')
lis = items.find('li')
print(type(lis))
print(lis)

 

七、兄弟元素

html = '''
<ul id="container">
    <li class="wow fadeIn">
        <div class="d-flex latest-small-thumb">
            <div class="post-thumb d-flex mr-15 border-radius-10 img-hover-scale overflow-hidden">
                <a class="color-white" href="single.html" tabindex="0">
                    <img src="assets/imgs/news/thumb-11.jpg" alt="">
                </a>
            </div>
            <div class="post-content media-body align-self-center">
                <h5 class="post-title mb-15 text-limit-3-row font-medium">
                    <a href="single.html" tabindex="0">9 Things I Love About Shaving My Head During Quarantine</a>
                </h5>
            </div>
        </div>
    </li>
</ul>
'''

from pyquery import PyQuery as pq

doc = pq(html)
div = doc('#container .post-thumb')
print(div.siblings())

 

八、获取属性

html = '''
<ul id="container">
    <li class="wow fadeIn">
        <div class="d-flex latest-small-thumb">
            <div class="post-thumb d-flex mr-15 border-radius-10 img-hover-scale overflow-hidden">
                <a class="color-white" href="single.html" tabindex="0">
                    <img src="assets/imgs/news/thumb-11.jpg" alt="">
                </a>
            </div>
            <div class="post-content media-body align-self-center">
                <h5 class="post-title mb-15 text-limit-3-row font-medium">
                    <a href="single.html" tabindex="0">9 Things I Love About Shaving My Head During Quarantine</a>
                </h5>
            </div>
        </div>
    </li>
</ul>
'''

from pyquery import PyQuery as pq

doc = pq(html)
a = doc('#container .post-content a')
print(a)
print(a.attr('href'))
print(a.attr.href)

 

九、获取文本

html = '''
<ul id="container">
    <li class="wow fadeIn">
        <div class="d-flex latest-small-thumb">
            <div class="post-thumb d-flex mr-15 border-radius-10 img-hover-scale overflow-hidden">
                <a class="color-white" href="single.html" tabindex="0">
                    <img src="assets/imgs/news/thumb-11.jpg" alt="">
                </a>
            </div>
            <div class="post-content media-body align-self-center">
                <h5 class="post-title mb-15 text-limit-3-row font-medium">
                    <a href="single.html" tabindex="0">9 Things I Love About Shaving My Head During Quarantine</a>
                </h5>
            </div>
        </div>
    </li>
</ul>
'''

from pyquery import PyQuery as pq

doc = pq(html)
a = doc('#container .post-content a').text()
print(a)

 

十、类操作

html = '''
<ul id="container">
    <li class="wow fadeIn">
        <div class="d-flex latest-small-thumb">
            <div class="post-thumb d-flex mr-15 border-radius-10 img-hover-scale overflow-hidden">
                <a class="color-white" href="single.html" tabindex="0">
                    <img src="assets/imgs/news/thumb-11.jpg" alt="">
                </a>
            </div>
            <div class="post-content media-body align-self-center">
                <h5 class="post-title mb-15 text-limit-3-row font-medium">
                    <a href="single.html" tabindex="0">9 Things I Love About Shaving My Head During Quarantine</a>
                </h5>
            </div>
        </div>
    </li>
</ul>
'''

from pyquery import PyQuery as pq

doc = pq(html)
li = doc('#container li')
print(li)
li.removeClass('fadeIn')
print(li)
li.addClass('fadeIn')
print(li)

 

标签:pq,PyQuery,包之,python,doc,html,print,pyquery
来源: https://www.cnblogs.com/autofelix/p/16177575.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有