标签:04 true 城市 天气 爬取 type data u65e5 opts
一.主题式网络主题式网络爬虫设计方案
1、主题式网络爬虫名称
天气网数据获取及数据分析
2、主题式网络爬虫爬取的内容与数据特征分析
历史天气和未来天气
3、主题式网络爬虫设计方案概述(包括实现思路与技术难点)
思路: 通过天气页获取所有城市名称以及历史天气数据链接
难点: 天气网反爬虫检测
二.主题页面的结构特征分析
1.主题页面的结构与特征分析
对网页数据进行提取
爬取网页https//lishi.tianqi.com
2.Htmls页面解析(包括历史天气及未来天气)
3.节点(标签)查找方法与遍历方法
三、网络爬虫程序设计
1.对数据爬取与采集
2.对数据进行清洗和处理
4.数据分析与可视化
这部分包括了绘制折线图及饼图
# In[18]: import pyecharts.options as opts from pyecharts.charts import Line def draw_line_base(x, y1, y2): c = ( Line(init_opts=opts.InitOpts(theme="dark")) .add_xaxis(x) .add_yaxis("最高温度", y1) .add_yaxis("最低温度", y2) .set_global_opts(title_opts=opts.TitleOpts(title="未来七天温度")) ) return c draw_line_base(list(data.thedate), list(data.high), list(data.lower)).render_notebook()
# In[26]: # 分组统计 from pyecharts.charts import Pie def draw_bar_base(x,y): c = ( Pie(init_opts=opts.InitOpts(theme="dark")) .add("", [list(z) for z in zip(x, y)]) .set_global_opts(title_opts=opts.TitleOpts(title="天气统计")) .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c}")) ) return c data_group = data.groupby(by='weather') data_group_size = data_group.size() draw_bar_base(data_group_size.index, list(data_group_size)).render_notebook()
6.数据持久化
7.将以上各部分的代码汇总,附上完整程序代码
{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 一、主题式网络爬虫设计方案\n", "1、主题式网络爬虫名称 \n", " 天气网数据获取及数据分析\n", "2、主题式网络爬虫爬取的内容与数据特征分析\n", " 历史天气和未来天气\n", "3、主题式网络爬虫设计方案概述(包括实现思路与技术难点)\n", " 思路: 通过天气页获取所有城市名称以及历史天气数据链接\n", " 难点: 天气网反爬虫检测\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " thedate weather temperature wine high lower\n", "0 04月15日 阴转雨 26,13 南风 26 13\n", "1 04月16日 小雨到暴雨 20,10 北风 20 10\n", "2 04月17日 多云 19,8 东北风 19 8\n", "3 04月18日 多云 21,11 西南风 21 11\n", "4 04月19日 多云 19,11 西南风 19 11\n", "5 04月20日 多云 19,8 东北风 19 8\n", "6 04月21日 多云 20,8 东风 20 8\n" ] } ], "source": [ "import pandas as pd\n", "\n", "filename = r\"L:\\Python-项目\\天气\\20200415.csv\"\n", "data = pd.read_csv(filename, names=(\"thedate\", \"weather\", \"temperature\", \"wine\"))\n", "\n", "import re\n", "def filter_thedate(frame):\n", " result = re.search(\"\\d+月\\d+日\", frame)\n", " return result.group()\n", "\n", "# 数据提取通过正则提取日期\n", "data[\"thedate\"] = data.thedate.apply(filter_thedate)\n", "\n", "#通过字符串分割将存在一起的天气分割为两列\n", "data[\"high\"], data[\"lower\"] = data.temperature.str.split(\",\",1).str\n", "print(data)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "<script>\n", " require.config({\n", " paths: {\n", " 'echarts':'https://assets.pyecharts.org/assets/echarts.min'\n", " }\n", " });\n", "</script>\n", "\n", " <div id=\"a58f4ae55df7444dbab81789828790a3\" style=\"width:900px; height:500px;\"></div>\n", "\n", "<script>\n", " require(['echarts'], function(echarts) {\n", " var chart_a58f4ae55df7444dbab81789828790a3 = echarts.init(\n", " document.getElementById('a58f4ae55df7444dbab81789828790a3'), 'dark', {renderer: 'canvas'});\n", " var option_a58f4ae55df7444dbab81789828790a3 = {\n", " \"animation\": true,\n", " \"animationThreshold\": 2000,\n", " \"animationDuration\": 1000,\n", " \"animationEasing\": \"cubicOut\",\n", " \"animationDelay\": 0,\n", " \"animationDurationUpdate\": 300,\n", " \"animationEasingUpdate\": \"cubicOut\",\n", " \"animationDelayUpdate\": 0,\n", " \"series\": [\n", " {\n", " \"type\": \"line\",\n", " \"name\": \"\\u6700\\u9ad8\\u6e29\\u5ea6\",\n", " \"connectNulls\": false,\n", " \"symbolSize\": 4,\n", " \"showSymbol\": true,\n", " \"smooth\": false,\n", " \"step\": false,\n", " \"data\": [\n", " [\n", " \"04\\u670815\\u65e5\",\n", " \"26\"\n", " ],\n", " [\n", " \"04\\u670816\\u65e5\",\n", " \"20\"\n", " ],\n", " [\n", " \"04\\u670817\\u65e5\",\n", " \"19\"\n", " ],\n", " [\n", " \"04\\u670818\\u65e5\",\n", " \"21\"\n", " ],\n", " [\n", " \"04\\u670819\\u65e5\",\n", " \"19\"\n", " ],\n", " [\n", " \"04\\u670820\\u65e5\",\n", " \"19\"\n", " ],\n", " [\n", " \"04\\u670821\\u65e5\",\n", " \"20\"\n", " ]\n", " ],\n", " \"hoverAnimation\": true,\n", " \"label\": {\n", " \"show\": true,\n", " \"position\": \"top\",\n", " \"margin\": 8\n", " },\n", " \"lineStyle\": {\n", " \"width\": 1,\n", " \"opacity\": 1,\n", " \"curveness\": 0,\n", " \"type\": \"solid\"\n", " },\n", " \"areaStyle\": {\n", " \"opacity\": 0\n", " }\n", " },\n", " {\n", " \"type\": \"line\",\n", " \"name\": \"\\u6700\\u4f4e\\u6e29\\u5ea6\",\n", " \"connectNulls\": false,\n", " \"symbolSize\": 4,\n", " \"showSymbol\": true,\n", " \"smooth\": false,\n", " \"step\": false,\n", " \"data\": [\n", " [\n", " \"04\\u670815\\u65e5\",\n", " \"13\"\n", " ],\n", " [\n", " \"04\\u670816\\u65e5\",\n", " \"10\"\n", " ],\n", " [\n", " \"04\\u670817\\u65e5\",\n", " \"8\"\n", " ],\n", " [\n", " \"04\\u670818\\u65e5\",\n", " \"11\"\n", " ],\n", " [\n", " \"04\\u670819\\u65e5\",\n", " \"11\"\n", " ],\n", " [\n", " \"04\\u670820\\u65e5\",\n", " \"8\"\n", " ],\n", " [\n", " \"04\\u670821\\u65e5\",\n", " \"8\"\n", " ]\n", " ],\n", " \"hoverAnimation\": true,\n", " \"label\": {\n", " \"show\": true,\n", " \"position\": \"top\",\n", " \"margin\": 8\n", " },\n", " \"lineStyle\": {\n", " \"width\": 1,\n", " \"opacity\": 1,\n", " \"curveness\": 0,\n", " \"type\": \"solid\"\n", " },\n", " \"areaStyle\": {\n", " \"opacity\": 0\n", " }\n", " }\n", " ],\n", " \"legend\": [\n", " {\n", " \"data\": [\n", " \"\\u6700\\u9ad8\\u6e29\\u5ea6\",\n", " \"\\u6700\\u4f4e\\u6e29\\u5ea6\"\n", " ],\n", " \"selected\": {\n", " \"\\u6700\\u9ad8\\u6e29\\u5ea6\": true,\n", " \"\\u6700\\u4f4e\\u6e29\\u5ea6\": true\n", " },\n", " \"show\": true,\n", " \"padding\": 5,\n", " \"itemGap\": 10,\n", " \"itemWidth\": 25,\n", " \"itemHeight\": 14\n", " }\n", " ],\n", " \"tooltip\": {\n", " \"show\": true,\n", " \"trigger\": \"item\",\n", " \"triggerOn\": \"mousemove|click\",\n", " \"axisPointer\": {\n", " \"type\": \"line\"\n", " },\n", " \"textStyle\": {\n", " \"fontSize\": 14\n", " },\n", " \"borderWidth\": 0\n", " },\n", " \"xAxis\": [\n", " {\n", " \"show\": true,\n", " \"scale\": false,\n", " \"nameLocation\": \"end\",\n", " \"nameGap\": 15,\n", " \"gridIndex\": 0,\n", " \"inverse\": false,\n", " \"offset\": 0,\n", " \"splitNumber\": 5,\n", " \"minInterval\": 0,\n", " \"splitLine\": {\n", " \"show\": false,\n", " \"lineStyle\": {\n", " \"width\": 1,\n", " \"opacity\": 1,\n", " \"curveness\": 0,\n", " \"type\": \"solid\"\n", " }\n", " },\n", " \"data\": [\n", " \"04\\u670815\\u65e5\",\n", " \"04\\u670816\\u65e5\",\n", " \"04\\u670817\\u65e5\",\n", " \"04\\u670818\\u65e5\",\n", " \"04\\u670819\\u65e5\",\n", " \"04\\u670820\\u65e5\",\n", " \"04\\u670821\\u65e5\"\n", " ]\n", " }\n", " ],\n", " \"yAxis\": [\n", " {\n", " \"show\": true,\n", " \"scale\": false,\n", " \"nameLocation\": \"end\",\n", " \"nameGap\": 15,\n", " \"gridIndex\": 0,\n", " \"inverse\": false,\n", " \"offset\": 0,\n", " \"splitNumber\": 5,\n", " \"minInterval\": 0,\n", " \"splitLine\": {\n", " \"show\": false,\n", " \"lineStyle\": {\n", " \"width\": 1,\n", " \"opacity\": 1,\n", " \"curveness\": 0,\n", " \"type\": \"solid\"\n", " }\n", " }\n", " }\n", " ],\n", " \"title\": [\n", " {\n", " \"text\": \"\\u672a\\u6765\\u4e03\\u5929\\u6e29\\u5ea6\"\n", " }\n", " ]\n", "};\n", " chart_a58f4ae55df7444dbab81789828790a3.setOption(option_a58f4ae55df7444dbab81789828790a3);\n", " });\n", " </script>\n" ], "text/plain": [ "<pyecharts.render.display.HTML at 0x1e3ddcf14c8>" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pyecharts.options as opts\n", "from pyecharts.charts import Line\n", "\n", "def draw_line_base(x, y1, y2):\n", " c = (\n", " Line(init_opts=opts.InitOpts(theme=\"dark\"))\n", " .add_xaxis(x)\n", " .add_yaxis(\"最高温度\", y1)\n", " .add_yaxis(\"最低温度\", y2)\n", " .set_global_opts(title_opts=opts.TitleOpts(title=\"未来七天温度\"))\n", " )\n", " return c\n", "\n", "draw_line_base(list(data.thedate), list(data.high), list(data.lower)).render_notebook()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", "<script>\n", " require.config({\n", " paths: {\n", " 'echarts':'https://assets.pyecharts.org/assets/echarts.min'\n", " }\n", " });\n", "</script>\n", "\n", " <div id=\"af320667e80c4558be518ee3263149cc\" style=\"width:900px; height:500px;\"></div>\n", "\n", "<script>\n", " require(['echarts'], function(echarts) {\n", " var chart_af320667e80c4558be518ee3263149cc = echarts.init(\n", " document.getElementById('af320667e80c4558be518ee3263149cc'), 'dark', {renderer: 'canvas'});\n", " var option_af320667e80c4558be518ee3263149cc = {\n", " \"animation\": true,\n", " \"animationThreshold\": 2000,\n", " \"animationDuration\": 1000,\n", " \"animationEasing\": \"cubicOut\",\n", " \"animationDelay\": 0,\n", " \"animationDurationUpdate\": 300,\n", " \"animationEasingUpdate\": \"cubicOut\",\n", " \"animationDelayUpdate\": 0,\n", " \"series\": [\n", " {\n", " \"type\": \"pie\",\n", " \"clockwise\": true,\n", " \"data\": [\n", " {\n", " \"name\": \"\\u591a\\u4e91\",\n", " \"value\": 5\n", " },\n", " {\n", " \"name\": \"\\u5c0f\\u96e8\\u5230\\u66b4\\u96e8\",\n", " \"value\": 1\n", " },\n", " {\n", " \"name\": \"\\u9634\\u8f6c\\u96e8\",\n", " \"value\": 1\n", " }\n", " ],\n", " \"radius\": [\n", " \"0%\",\n", " \"75%\"\n", " ],\n", " \"center\": [\n", " \"50%\",\n", " \"50%\"\n", " ],\n", " \"label\": {\n", " \"show\": true,\n", " \"position\": \"top\",\n", " \"margin\": 8,\n", " \"formatter\": \"{b}: {c}\"\n", " },\n", " \"rippleEffect\": {\n", " \"show\": true,\n", " \"brushType\": \"stroke\",\n", " \"scale\": 2.5,\n", " \"period\": 4\n", " }\n", " }\n", " ],\n", " \"legend\": [\n", " {\n", " \"data\": [\n", " \"\\u591a\\u4e91\",\n", " \"\\u5c0f\\u96e8\\u5230\\u66b4\\u96e8\",\n", " \"\\u9634\\u8f6c\\u96e8\"\n", " ],\n", " \"selected\": {},\n", " \"show\": true,\n", " \"padding\": 5,\n", " \"itemGap\": 10,\n", " \"itemWidth\": 25,\n", " \"itemHeight\": 14\n", " }\n", " ],\n", " \"tooltip\": {\n", " \"show\": true,\n", " \"trigger\": \"item\",\n", " \"triggerOn\": \"mousemove|click\",\n", " \"axisPointer\": {\n", " \"type\": \"line\"\n", " },\n", " \"textStyle\": {\n", " \"fontSize\": 14\n", " },\n", " \"borderWidth\": 0\n", " },\n", " \"title\": [\n", " {\n", " \"text\": \"\\u5929\\u6c14\\u7edf\\u8ba1\"\n", " }\n", " ]\n", "};\n", " chart_af320667e80c4558be518ee3263149cc.setOption(option_af320667e80c4558be518ee3263149cc);\n", " });\n", " </script>\n" ], "text/plain": [ "<pyecharts.render.display.HTML at 0x1e3dff69648>" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 分组统计\n", "\n", "from pyecharts.charts import Pie\n", "\n", "def draw_bar_base(x,y):\n", " c = (\n", " Pie(init_opts=opts.InitOpts(theme=\"dark\"))\n", " .add(\"\", [list(z) for z in zip(x, y)])\n", " .set_global_opts(title_opts=opts.TitleOpts(title=\"天气统计\"))\n", " .set_series_opts(label_opts=opts.LabelOpts(formatter=\"{b}: {c}\"))\n", " )\n", " return c\n", "data_group = data.groupby(by='weather')\n", "data_group_size = data_group.size()\n", "draw_bar_base(data_group_size.index, list(data_group_size)).render_notebook()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 2 }
四、结论
1.经过对主题数据的分析与可视化,可以了解到各城市的历史天气及未来天气
能及时的掌握全国各地天气情况有可知性和防范性,非常方便
2.对本次程序设计任务完成的情况做一个简单的小结。
对于完成此次课程设计任务可以了解到自己的不足及对知识获取的喜悦,通过爬虫
技术可以做到很多很酷很有趣很有用的事情。假期期间除了学习书本上的内容我还
通过网络资源学习并巩固了python编程知识,使我热爱上这门学科。
标签:04,true,城市,天气,爬取,type,data,u65e5,opts 来源: https://www.cnblogs.com/tzq-0716/p/12709494.html
本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享; 2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关; 3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关; 4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除; 5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。