ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

爬取天气网城市天气,并对其进行分析

2020-04-16 14:09:47  阅读:228  来源: 互联网

标签:04 true 城市 天气 爬取 type data u65e5 opts


一.主题式网络主题式网络爬虫设计方案

1、主题式网络爬虫名称
    天气网数据获取及数据分析
2、主题式网络爬虫爬取的内容与数据特征分析
    历史天气和未来天气
3、主题式网络爬虫设计方案概述(包括实现思路与技术难点)
    思路: 通过天气页获取所有城市名称以及历史天气数据链接
    难点: 天气网反爬虫检测

二.主题页面的结构特征分析

 1.主题页面的结构与特征分析

对网页数据进行提取

爬取网页https//lishi.tianqi.com

2.Htmls页面解析(包括历史天气及未来天气)

 

3.节点(标签)查找方法与遍历方法

 

三、网络爬虫程序设计

1.对数据爬取与采集

2.对数据进行清洗和处理

4.数据分析与可视化

这部分包括了绘制折线图及饼图

# In[18]:


import pyecharts.options as opts
from pyecharts.charts import Line

def draw_line_base(x, y1, y2):
    c = (
        Line(init_opts=opts.InitOpts(theme="dark"))
        .add_xaxis(x)
        .add_yaxis("最高温度", y1)
        .add_yaxis("最低温度", y2)
        .set_global_opts(title_opts=opts.TitleOpts(title="未来七天温度"))
    )
    return c

draw_line_base(list(data.thedate), list(data.high), list(data.lower)).render_notebook()

 

 

# In[26]:


# 分组统计

from pyecharts.charts import Pie

def draw_bar_base(x,y):
    c = (
        Pie(init_opts=opts.InitOpts(theme="dark"))
        .add("", [list(z) for z in zip(x, y)])
        .set_global_opts(title_opts=opts.TitleOpts(title="天气统计"))
        .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c}"))
    )
    return c
data_group = data.groupby(by='weather')
data_group_size = data_group.size()
draw_bar_base(data_group_size.index, list(data_group_size)).render_notebook()

 

6.数据持久化

7.将以上各部分的代码汇总,附上完整程序代码

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 一、主题式网络爬虫设计方案\n",
    "1、主题式网络爬虫名称 \n",
    "    天气网数据获取及数据分析\n",
    "2、主题式网络爬虫爬取的内容与数据特征分析\n",
    "    历史天气和未来天气\n",
    "3、主题式网络爬虫设计方案概述(包括实现思路与技术难点)\n",
    "    思路: 通过天气页获取所有城市名称以及历史天气数据链接\n",
    "    难点: 天气网反爬虫检测\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  thedate weather temperature wine high lower\n",
      "0  04月15日     阴转雨       26,13   南风   26    13\n",
      "1  04月16日   小雨到暴雨       20,10   北风   20    10\n",
      "2  04月17日      多云        19,8  东北风   19     8\n",
      "3  04月18日      多云       21,11  西南风   21    11\n",
      "4  04月19日      多云       19,11  西南风   19    11\n",
      "5  04月20日      多云        19,8  东北风   19     8\n",
      "6  04月21日      多云        20,8   东风   20     8\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "filename = r\"L:\\Python-项目\\天气\\20200415.csv\"\n",
    "data = pd.read_csv(filename, names=(\"thedate\", \"weather\", \"temperature\", \"wine\"))\n",
    "\n",
    "import re\n",
    "def filter_thedate(frame):\n",
    "    result = re.search(\"\\d+月\\d+日\", frame)\n",
    "    return result.group()\n",
    "\n",
    "# 数据提取通过正则提取日期\n",
    "data[\"thedate\"] = data.thedate.apply(filter_thedate)\n",
    "\n",
    "#通过字符串分割将存在一起的天气分割为两列\n",
    "data[\"high\"], data[\"lower\"] = data.temperature.str.split(\",\",1).str\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<script>\n",
       "    require.config({\n",
       "        paths: {\n",
       "            'echarts':'https://assets.pyecharts.org/assets/echarts.min'\n",
       "        }\n",
       "    });\n",
       "</script>\n",
       "\n",
       "        <div id=\"a58f4ae55df7444dbab81789828790a3\" style=\"width:900px; height:500px;\"></div>\n",
       "\n",
       "<script>\n",
       "        require(['echarts'], function(echarts) {\n",
       "                var chart_a58f4ae55df7444dbab81789828790a3 = echarts.init(\n",
       "                    document.getElementById('a58f4ae55df7444dbab81789828790a3'), 'dark', {renderer: 'canvas'});\n",
       "                var option_a58f4ae55df7444dbab81789828790a3 = {\n",
       "    \"animation\": true,\n",
       "    \"animationThreshold\": 2000,\n",
       "    \"animationDuration\": 1000,\n",
       "    \"animationEasing\": \"cubicOut\",\n",
       "    \"animationDelay\": 0,\n",
       "    \"animationDurationUpdate\": 300,\n",
       "    \"animationEasingUpdate\": \"cubicOut\",\n",
       "    \"animationDelayUpdate\": 0,\n",
       "    \"series\": [\n",
       "        {\n",
       "            \"type\": \"line\",\n",
       "            \"name\": \"\\u6700\\u9ad8\\u6e29\\u5ea6\",\n",
       "            \"connectNulls\": false,\n",
       "            \"symbolSize\": 4,\n",
       "            \"showSymbol\": true,\n",
       "            \"smooth\": false,\n",
       "            \"step\": false,\n",
       "            \"data\": [\n",
       "                [\n",
       "                    \"04\\u670815\\u65e5\",\n",
       "                    \"26\"\n",
       "                ],\n",
       "                [\n",
       "                    \"04\\u670816\\u65e5\",\n",
       "                    \"20\"\n",
       "                ],\n",
       "                [\n",
       "                    \"04\\u670817\\u65e5\",\n",
       "                    \"19\"\n",
       "                ],\n",
       "                [\n",
       "                    \"04\\u670818\\u65e5\",\n",
       "                    \"21\"\n",
       "                ],\n",
       "                [\n",
       "                    \"04\\u670819\\u65e5\",\n",
       "                    \"19\"\n",
       "                ],\n",
       "                [\n",
       "                    \"04\\u670820\\u65e5\",\n",
       "                    \"19\"\n",
       "                ],\n",
       "                [\n",
       "                    \"04\\u670821\\u65e5\",\n",
       "                    \"20\"\n",
       "                ]\n",
       "            ],\n",
       "            \"hoverAnimation\": true,\n",
       "            \"label\": {\n",
       "                \"show\": true,\n",
       "                \"position\": \"top\",\n",
       "                \"margin\": 8\n",
       "            },\n",
       "            \"lineStyle\": {\n",
       "                \"width\": 1,\n",
       "                \"opacity\": 1,\n",
       "                \"curveness\": 0,\n",
       "                \"type\": \"solid\"\n",
       "            },\n",
       "            \"areaStyle\": {\n",
       "                \"opacity\": 0\n",
       "            }\n",
       "        },\n",
       "        {\n",
       "            \"type\": \"line\",\n",
       "            \"name\": \"\\u6700\\u4f4e\\u6e29\\u5ea6\",\n",
       "            \"connectNulls\": false,\n",
       "            \"symbolSize\": 4,\n",
       "            \"showSymbol\": true,\n",
       "            \"smooth\": false,\n",
       "            \"step\": false,\n",
       "            \"data\": [\n",
       "                [\n",
       "                    \"04\\u670815\\u65e5\",\n",
       "                    \"13\"\n",
       "                ],\n",
       "                [\n",
       "                    \"04\\u670816\\u65e5\",\n",
       "                    \"10\"\n",
       "                ],\n",
       "                [\n",
       "                    \"04\\u670817\\u65e5\",\n",
       "                    \"8\"\n",
       "                ],\n",
       "                [\n",
       "                    \"04\\u670818\\u65e5\",\n",
       "                    \"11\"\n",
       "                ],\n",
       "                [\n",
       "                    \"04\\u670819\\u65e5\",\n",
       "                    \"11\"\n",
       "                ],\n",
       "                [\n",
       "                    \"04\\u670820\\u65e5\",\n",
       "                    \"8\"\n",
       "                ],\n",
       "                [\n",
       "                    \"04\\u670821\\u65e5\",\n",
       "                    \"8\"\n",
       "                ]\n",
       "            ],\n",
       "            \"hoverAnimation\": true,\n",
       "            \"label\": {\n",
       "                \"show\": true,\n",
       "                \"position\": \"top\",\n",
       "                \"margin\": 8\n",
       "            },\n",
       "            \"lineStyle\": {\n",
       "                \"width\": 1,\n",
       "                \"opacity\": 1,\n",
       "                \"curveness\": 0,\n",
       "                \"type\": \"solid\"\n",
       "            },\n",
       "            \"areaStyle\": {\n",
       "                \"opacity\": 0\n",
       "            }\n",
       "        }\n",
       "    ],\n",
       "    \"legend\": [\n",
       "        {\n",
       "            \"data\": [\n",
       "                \"\\u6700\\u9ad8\\u6e29\\u5ea6\",\n",
       "                \"\\u6700\\u4f4e\\u6e29\\u5ea6\"\n",
       "            ],\n",
       "            \"selected\": {\n",
       "                \"\\u6700\\u9ad8\\u6e29\\u5ea6\": true,\n",
       "                \"\\u6700\\u4f4e\\u6e29\\u5ea6\": true\n",
       "            },\n",
       "            \"show\": true,\n",
       "            \"padding\": 5,\n",
       "            \"itemGap\": 10,\n",
       "            \"itemWidth\": 25,\n",
       "            \"itemHeight\": 14\n",
       "        }\n",
       "    ],\n",
       "    \"tooltip\": {\n",
       "        \"show\": true,\n",
       "        \"trigger\": \"item\",\n",
       "        \"triggerOn\": \"mousemove|click\",\n",
       "        \"axisPointer\": {\n",
       "            \"type\": \"line\"\n",
       "        },\n",
       "        \"textStyle\": {\n",
       "            \"fontSize\": 14\n",
       "        },\n",
       "        \"borderWidth\": 0\n",
       "    },\n",
       "    \"xAxis\": [\n",
       "        {\n",
       "            \"show\": true,\n",
       "            \"scale\": false,\n",
       "            \"nameLocation\": \"end\",\n",
       "            \"nameGap\": 15,\n",
       "            \"gridIndex\": 0,\n",
       "            \"inverse\": false,\n",
       "            \"offset\": 0,\n",
       "            \"splitNumber\": 5,\n",
       "            \"minInterval\": 0,\n",
       "            \"splitLine\": {\n",
       "                \"show\": false,\n",
       "                \"lineStyle\": {\n",
       "                    \"width\": 1,\n",
       "                    \"opacity\": 1,\n",
       "                    \"curveness\": 0,\n",
       "                    \"type\": \"solid\"\n",
       "                }\n",
       "            },\n",
       "            \"data\": [\n",
       "                \"04\\u670815\\u65e5\",\n",
       "                \"04\\u670816\\u65e5\",\n",
       "                \"04\\u670817\\u65e5\",\n",
       "                \"04\\u670818\\u65e5\",\n",
       "                \"04\\u670819\\u65e5\",\n",
       "                \"04\\u670820\\u65e5\",\n",
       "                \"04\\u670821\\u65e5\"\n",
       "            ]\n",
       "        }\n",
       "    ],\n",
       "    \"yAxis\": [\n",
       "        {\n",
       "            \"show\": true,\n",
       "            \"scale\": false,\n",
       "            \"nameLocation\": \"end\",\n",
       "            \"nameGap\": 15,\n",
       "            \"gridIndex\": 0,\n",
       "            \"inverse\": false,\n",
       "            \"offset\": 0,\n",
       "            \"splitNumber\": 5,\n",
       "            \"minInterval\": 0,\n",
       "            \"splitLine\": {\n",
       "                \"show\": false,\n",
       "                \"lineStyle\": {\n",
       "                    \"width\": 1,\n",
       "                    \"opacity\": 1,\n",
       "                    \"curveness\": 0,\n",
       "                    \"type\": \"solid\"\n",
       "                }\n",
       "            }\n",
       "        }\n",
       "    ],\n",
       "    \"title\": [\n",
       "        {\n",
       "            \"text\": \"\\u672a\\u6765\\u4e03\\u5929\\u6e29\\u5ea6\"\n",
       "        }\n",
       "    ]\n",
       "};\n",
       "                chart_a58f4ae55df7444dbab81789828790a3.setOption(option_a58f4ae55df7444dbab81789828790a3);\n",
       "        });\n",
       "    </script>\n"
      ],
      "text/plain": [
       "<pyecharts.render.display.HTML at 0x1e3ddcf14c8>"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pyecharts.options as opts\n",
    "from pyecharts.charts import Line\n",
    "\n",
    "def draw_line_base(x, y1, y2):\n",
    "    c = (\n",
    "        Line(init_opts=opts.InitOpts(theme=\"dark\"))\n",
    "        .add_xaxis(x)\n",
    "        .add_yaxis(\"最高温度\", y1)\n",
    "        .add_yaxis(\"最低温度\", y2)\n",
    "        .set_global_opts(title_opts=opts.TitleOpts(title=\"未来七天温度\"))\n",
    "    )\n",
    "    return c\n",
    "\n",
    "draw_line_base(list(data.thedate), list(data.high), list(data.lower)).render_notebook()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<script>\n",
       "    require.config({\n",
       "        paths: {\n",
       "            'echarts':'https://assets.pyecharts.org/assets/echarts.min'\n",
       "        }\n",
       "    });\n",
       "</script>\n",
       "\n",
       "        <div id=\"af320667e80c4558be518ee3263149cc\" style=\"width:900px; height:500px;\"></div>\n",
       "\n",
       "<script>\n",
       "        require(['echarts'], function(echarts) {\n",
       "                var chart_af320667e80c4558be518ee3263149cc = echarts.init(\n",
       "                    document.getElementById('af320667e80c4558be518ee3263149cc'), 'dark', {renderer: 'canvas'});\n",
       "                var option_af320667e80c4558be518ee3263149cc = {\n",
       "    \"animation\": true,\n",
       "    \"animationThreshold\": 2000,\n",
       "    \"animationDuration\": 1000,\n",
       "    \"animationEasing\": \"cubicOut\",\n",
       "    \"animationDelay\": 0,\n",
       "    \"animationDurationUpdate\": 300,\n",
       "    \"animationEasingUpdate\": \"cubicOut\",\n",
       "    \"animationDelayUpdate\": 0,\n",
       "    \"series\": [\n",
       "        {\n",
       "            \"type\": \"pie\",\n",
       "            \"clockwise\": true,\n",
       "            \"data\": [\n",
       "                {\n",
       "                    \"name\": \"\\u591a\\u4e91\",\n",
       "                    \"value\": 5\n",
       "                },\n",
       "                {\n",
       "                    \"name\": \"\\u5c0f\\u96e8\\u5230\\u66b4\\u96e8\",\n",
       "                    \"value\": 1\n",
       "                },\n",
       "                {\n",
       "                    \"name\": \"\\u9634\\u8f6c\\u96e8\",\n",
       "                    \"value\": 1\n",
       "                }\n",
       "            ],\n",
       "            \"radius\": [\n",
       "                \"0%\",\n",
       "                \"75%\"\n",
       "            ],\n",
       "            \"center\": [\n",
       "                \"50%\",\n",
       "                \"50%\"\n",
       "            ],\n",
       "            \"label\": {\n",
       "                \"show\": true,\n",
       "                \"position\": \"top\",\n",
       "                \"margin\": 8,\n",
       "                \"formatter\": \"{b}: {c}\"\n",
       "            },\n",
       "            \"rippleEffect\": {\n",
       "                \"show\": true,\n",
       "                \"brushType\": \"stroke\",\n",
       "                \"scale\": 2.5,\n",
       "                \"period\": 4\n",
       "            }\n",
       "        }\n",
       "    ],\n",
       "    \"legend\": [\n",
       "        {\n",
       "            \"data\": [\n",
       "                \"\\u591a\\u4e91\",\n",
       "                \"\\u5c0f\\u96e8\\u5230\\u66b4\\u96e8\",\n",
       "                \"\\u9634\\u8f6c\\u96e8\"\n",
       "            ],\n",
       "            \"selected\": {},\n",
       "            \"show\": true,\n",
       "            \"padding\": 5,\n",
       "            \"itemGap\": 10,\n",
       "            \"itemWidth\": 25,\n",
       "            \"itemHeight\": 14\n",
       "        }\n",
       "    ],\n",
       "    \"tooltip\": {\n",
       "        \"show\": true,\n",
       "        \"trigger\": \"item\",\n",
       "        \"triggerOn\": \"mousemove|click\",\n",
       "        \"axisPointer\": {\n",
       "            \"type\": \"line\"\n",
       "        },\n",
       "        \"textStyle\": {\n",
       "            \"fontSize\": 14\n",
       "        },\n",
       "        \"borderWidth\": 0\n",
       "    },\n",
       "    \"title\": [\n",
       "        {\n",
       "            \"text\": \"\\u5929\\u6c14\\u7edf\\u8ba1\"\n",
       "        }\n",
       "    ]\n",
       "};\n",
       "                chart_af320667e80c4558be518ee3263149cc.setOption(option_af320667e80c4558be518ee3263149cc);\n",
       "        });\n",
       "    </script>\n"
      ],
      "text/plain": [
       "<pyecharts.render.display.HTML at 0x1e3dff69648>"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 分组统计\n",
    "\n",
    "from pyecharts.charts import Pie\n",
    "\n",
    "def draw_bar_base(x,y):\n",
    "    c = (\n",
    "        Pie(init_opts=opts.InitOpts(theme=\"dark\"))\n",
    "        .add(\"\", [list(z) for z in zip(x, y)])\n",
    "        .set_global_opts(title_opts=opts.TitleOpts(title=\"天气统计\"))\n",
    "        .set_series_opts(label_opts=opts.LabelOpts(formatter=\"{b}: {c}\"))\n",
    "    )\n",
    "    return c\n",
    "data_group = data.groupby(by='weather')\n",
    "data_group_size = data_group.size()\n",
    "draw_bar_base(data_group_size.index, list(data_group_size)).render_notebook()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

 

四、结论

1.经过对主题数据的分析与可视化,可以了解到各城市的历史天气及未来天气

能及时的掌握全国各地天气情况有可知性和防范性,非常方便

2.对本次程序设计任务完成的情况做一个简单的小结。

对于完成此次课程设计任务可以了解到自己的不足及对知识获取的喜悦,通过爬虫

技术可以做到很多很酷很有趣很有用的事情。假期期间除了学习书本上的内容我还

通过网络资源学习并巩固了python编程知识,使我热爱上这门学科。

 

标签:04,true,城市,天气,爬取,type,data,u65e5,opts
来源: https://www.cnblogs.com/tzq-0716/p/12709494.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有