ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

实践广告精准投放的bz2数据转parquet文件场景案例

2021-06-27 12:29:28  阅读:237  来源: 互联网

标签:arr bz2 nullable true name parquet type 精准 metadata


导读Parquet的用途

(1)Parquet就是基于Google的Dremel系统的数据模型和算法实现的,可以跳过不符合条件的数据,只读取需要的数据,降低IO数据量;

(2)压缩编码可以降低磁盘存储空间。由于同一列的数据类型是一样的,可以使用更高效的压缩编码(例如Run Length Encoding和Delta Encoding)进一步节约存储空间

(3)由于Parquet是基于Google的Dremel系统的数据模型和算法实现的,所以只读取需要的列,支持向量运算,能够获取更好的扫描性能

(4)如果说HDFS 是大数据时代分布式文件系统首选标准,那么parquet则是整个大数据时代文件存储格式实时首选标准

(5)极大的减少磁盘I/o,通常情况下能够减少75%的存储空间,由此可以极大的减少spark sql处理数据的时候的数据输入内容,尤其是在spark1.6x中有个下推过滤器在一些情况下可以极大的减少磁盘的IO和内存的占用,(下推过滤器)


(6)spark 1.6x parquet方式极大的提升了扫描的吞吐量,极大提高了数据的查找速度spark1.6和spark1.5x相比而言,提升了大约1倍的速度,在spark1.6X中,操作parquet时候cpu也进行了极大的优化,有效的降低了cpu


(7)采用parquet可以极大的优化spark的调度和执行。我们测试spark如果用parquet可以有效的减少stage的执行消耗,同时可以优化执行路径

需求日志说明

1sessionid: String, 会话标
2advertisersid: Int, 广告主id
3adorderid: Int, 广告id
4adcreativeid: Int, 广告创意id ( >= 200000 : dsp)
5adplatformproviderid: Int, 广告平台商id (>= 100000: rtb)
6sdkversion: String, sdk 版本号
7adplatformkey: String, 平台商key
8putinmodeltype: Int, 针对广告主的投放模式,1:展示量投放2:点击
9requestmode: Int, 数据请求方式(1:请求、2:展示、3:点击)
10adprice: Double, 广告价格
11requestdate: String, 请求时间,格式为:yyyy-m-dd hh:mm:ss
12ip: String, 设备用户的真实ip 地址
13appid: String, 应用id
14appname: String, 应用名称
15uuid: String, 设备唯一标识
16        device: String, 设备型号,如htc、iphone
17client: Int, 操作系统(1:android 2:ios 3:wp)
18osversion: String, 设备操作系统版本
19density: String, 设备屏幕的密度
20pw: Int, 设备屏幕宽度
21ph: Int, 设备屏幕高度
22long: String, 设备所在经度
23lat: String, 设备所在纬度
24provincename: String, 设备所在省份名称
25cityname: String, 设备所在城市名称
26ispid: Int, 运营商id
27ispname: String, 运营商名称
28networkmannerid: Int, 联网方式id
29networkmannername:String,联网方式名称
30iseffective: Int, 有效标识(有效指可以正常计费的)(0:无效1:
31isbilling: Int, 是否收费(0:未收费1:已收费)
32adspacetype: Int, 广告位类型(1:banner 2:插屏3:全屏)
33adspacetypename: String, 广告位类型名称(banner、插屏、全屏)
34devicetype: Int, 设备类型(1:手机2:平板)
35processnode: Int, 流程节点(1:请求量kpi 2:有效请求3:广告请
36apptype: Int, 应用类型id
37district: String, 设备所在县名称
38paymode: Int, 针对平台商的支付模式,1:展示量投放(CPM) 2:点击
39isbid: Int, 是否rtb
40bidprice: Double, rtb 竞价价格
41winprice: Double, rtb 竞价成功价格
42iswin: Int, 是否竞价成功
43cur: String, values:usd|rmb 等
44rate: Double, 汇率
45cnywinprice: Double, rtb 竞价成功转换成人民币的价格
46imei: String, imei
47mac: String, mac
48idfa: String, idfa
49openudid: String, openudid
50androidid: String, androidid
51rtbprovince: String, rtb 省
52rtbcity: String, rtb 市
53rtbdistrict: String, rtb 区
54rtbstreet: String, rtb 街道
55storeurl: String, app 的市场下载地址
56realip: String, 真实ip
57isqualityapp: Int, 优选标识
58bidfloor: Double, 底价
59aw: Int, 广告位的宽
60ah: Int, 广告位的高
61imeimd5: String, imei_md5
62macmd5: String, mac_md5
63idfamd5: String, idfa_md5
64openudidmd5: String, openudid_md5
65androididmd5: String, androidid_md5
66imeisha1: String, imei_sha1
67macsha1: String, mac_sha1
68idfasha1: String, idfa_sha1
69openudidsha1: String, openudid_sha1
70androididsha1: String, androidid_sha1
71uuidunknow: String, uuid_unknow tanx 密文
72userid: String, 平台用户id
73iptype: Int, 表示ip 类型
74initbidprice: Double, 初始出价
75adpayment: Double, 转换后的广告消费
76agentrate: Double, 代理商利润率
77lrate: Double, 代理利润率
78adxrate: Double, 媒介利润率
79title: String, 标题
80keywords: String, 关键字
81tagid: String, 广告位标识(当视频流量时值为视频ID 号)
82callbackdate: String, 回调时间格式为:YYYY/mm/dd hh:mm:ss
83channelid: String, 频道ID
84mediatype: Int 媒体类型:1 长尾媒体2 视频媒体3 独立媒体默认:1

日志格式转换

给定的日志文件格式为bz2文件格式,这是一个压缩文件,为了后续统计方便,我们需要将bz2文件进行格式转换,将bz2文件转换成parquet文件

为什么要将bz2文件转成parquet文件?

因为parquet文件是一个列式存储文件格式,优点:

① 可以针对不同的列采用适合的压缩算法,进一步降低磁盘空间;

② 可以跳过不需要读取的列,降低了磁盘IO的扫描,提升了IO的性能;

③ 兼容很多的大数据处理框架,hive、spark

直接代码

创建dolphin-doit01工程项目,代码结构如图所示:

POM文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.4.4</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>cn.sheep</groupId>
    <artifactId>dolphin-doit01</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>dolphin-doit01</name>
    <description>Demo project for Spring Boot</description>
    <properties>
        <java.version>1.8</java.version>
    </properties>
    <dependencies>
        <!--scala library-->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.10.6</version>
        </dependency>

        <!--spark cores-->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.10</artifactId>
            <version>1.6.3</version>
        </dependency>

        <!--spark sql-->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.10</artifactId>
            <version>1.6.3</version>
        </dependency>

        <!--mysql-->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.42</version>
        </dependency>

        <dependency>
            <groupId>io.netty</groupId>
            <artifactId>netty-all</artifactId>
            <version>4.1.17.Final</version>
        </dependency>


    </dependencies>

    <build>
        <plugins>
            <!--scala编译插件-->
            <plugin>
                <!-- see http://davidb.github.com/scala-maven-plugin -->
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.1.3</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-make:transitive</arg>
                                <arg>-dependencyfile</arg>
                                <arg>${project.build.directory}/.scala_dependencies</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
<!--                <version>2.13</version>-->
                <configuration>
                    <useFile>false</useFile>
                    <disableXmlReport>true</disableXmlReport>
                    <!-- If you have classpath issue like NoDefClassError,... -->
                    <!-- useManifestOnlyJar>false</useManifestOnlyJar -->
                    <includes>
                        <include>**/*Test.*</include>
                        <include>**/*Suite.*</include>
                    </includes>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

dolphin-doit01\src\main\scala\cn\sheep\dolphin\etl\Bz2Parquet.scala


package cn.sheep.dolphin.etl

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types._
import org.apache.spark.sql.{Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}

/** 将bz2日志文件转换成parquet文件
 * author: old sheep
 * Created 2021/03/20  21:40
 */
object Bz2Parquet {
  def main(args: Array[String]): Unit = {
    // 检验参数
    if (args.length != 2) {
      println(
        """
          |Usage: cn.sheep.dolphin.etl.Bz2Parquet
          |Param:
          |  bz2InputPath  bz2日志文件的输入路径
          | parquetOutPath  parquet文件的输出路径
        """.stripMargin)
      sys.exit(-1) // -1 非正常退出
    }

    // 接收参数(模式匹配了)
    val Array(bz2InputPath, parquetOutPath) = args

    val conf = new SparkConf()
      .setAppName("将bz2日志文件转换成parquet文件")
      .setMaster("local[*]")

    //读取离线的数据文件的
    val sc = new SparkContext(conf)

    // 读取离线的bz2日志文件
    val data = sc.textFile(bz2InputPath)

    // 过滤非法数据
    val filteredRDD: RDD[Array[String]] = data.map(_.split(",", -1)).filter(_.size >= 85)

    // parquet <- DataFrame(几种创建方式) <- SQLContext <- RDD
    val sqlc = new SQLContext(sc)


    //val sc = DolphinAppComm.createSparkContext("将bz2日志文件转换成parquet文件")


    // 导入隐式转换(把钥匙给他)
    import cn.sheep.dolphin.bean.RichString._

    // RDD[Row] <- RDD[Array[String]]
    val rowRDD = filteredRDD.map(arr => Row(
      arr(0),
      arr(1).toIntPlus,
      arr(2).toIntPlus,
      arr(3).toIntPlus,
      arr(4).toIntPlus,
      arr(5),
      arr(6),
      arr(7).toIntPlus,
      arr(8).toIntPlus,
      arr(9).toDoublePlus,
      arr(10).toDoublePlus,
      arr(11),
      arr(12),
      arr(13),
      arr(14),
      arr(15),
      arr(16),
      arr(17).toIntPlus,
      arr(18),
      arr(19),
      arr(20).toIntPlus,
      arr(21).toIntPlus,
      arr(22),
      arr(23),
      arr(24),
      arr(25),
      arr(26).toIntPlus,
      arr(27),
      arr(28).toIntPlus,
      arr(29),
      arr(30).toIntPlus,
      arr(31).toIntPlus,
      arr(32).toIntPlus,
      arr(33),
      arr(34).toIntPlus,
      arr(35).toIntPlus,
      arr(36).toIntPlus,
      arr(37),
      arr(38).toIntPlus,
      arr(39).toIntPlus,
      arr(40).toDoublePlus,
      arr(41).toDoublePlus,
      arr(42).toIntPlus,
      arr(43),
      arr(44).toDoublePlus,
      arr(45).toDoublePlus,
      arr(46),
      arr(47),
      arr(48),
      arr(49),
      arr(50),
      arr(51),
      arr(52),
      arr(53),
      arr(54),
      arr(55),
      arr(56),
      arr(57).toIntPlus,
      arr(58).toDoublePlus,
      arr(59).toIntPlus,
      arr(60).toIntPlus,
      arr(61),
      arr(62),
      arr(63),
      arr(64),
      arr(65),
      arr(66),
      arr(67),
      arr(68),
      arr(69),
      arr(70),
      arr(71),
      arr(72),
      arr(73).toIntPlus,
      arr(74).toDoublePlus,
      arr(75).toDoublePlus,
      arr(76).toDoublePlus,
      arr(77).toDoublePlus,
      arr(78).toDoublePlus,
      arr(79),
      arr(80),
      arr(81),
      arr(82),
      arr(83),
      arr(84).toIntPlus
    ))

    // schema: StructType <- demo
    val schema = StructType(Seq(
      StructField("sessionid", StringType),
      StructField("advertisersid", IntegerType),
      StructField("adorderid", IntegerType),
      StructField("adcreativeid", IntegerType),
      StructField("adplatformproviderid", IntegerType),
      StructField("sdkversion", StringType),
      StructField("adplatformkey", StringType),
      StructField("putinmodeltype", IntegerType),
      StructField("requestmode", IntegerType),
      StructField("adprice", DoubleType),
      StructField("adppprice", DoubleType),
      StructField("requestdate", StringType),
      StructField("ip", StringType),
      StructField("appid", StringType),
      StructField("appname", StringType),
      StructField("uuid", StringType),
      StructField("device", StringType),
      StructField("client", IntegerType),
      StructField("osversion", StringType),
      StructField("density", StringType),
      StructField("pw", IntegerType),
      StructField("ph", IntegerType),
      StructField("long", StringType),
      StructField("lat", StringType),
      StructField("provincename", StringType),
      StructField("cityname", StringType),
      StructField("ispid", IntegerType),
      StructField("ispname", StringType),
      StructField("networkmannerid", IntegerType),
      StructField("networkmannername",StringType),
      StructField("iseffective", IntegerType),
      StructField("isbilling", IntegerType),
      StructField("adspacetype", IntegerType),
      StructField("adspacetypename", StringType),
      StructField("devicetype", IntegerType),
      StructField("processnode", IntegerType),
      StructField("apptype", IntegerType),
      StructField("district", StringType),
      StructField("paymode", IntegerType),
      StructField("isbid", IntegerType),
      StructField("bidprice", DoubleType),
      StructField("winprice", DoubleType),
      StructField("iswin", IntegerType),
      StructField("cur", StringType),
      StructField("rate", DoubleType),
      StructField("cnywinprice", DoubleType),
      StructField("imei", StringType),
      StructField("mac", StringType),
      StructField("idfa", StringType),
      StructField("openudid", StringType),
      StructField("androidid", StringType),
      StructField("rtbprovince", StringType),
      StructField("rtbcity", StringType),
      StructField("rtbdistrict", StringType),
      StructField("rtbstreet", StringType),
      StructField("storeurl", StringType),
      StructField("realip", StringType),
      StructField("isqualityapp", IntegerType),
      StructField("bidfloor", DoubleType),
      StructField("aw", IntegerType),
      StructField("ah", IntegerType),
      StructField("imeimd5", StringType),
      StructField("macmd5", StringType),
      StructField("idfamd5", StringType),
      StructField("openudidmd5", StringType),
      StructField("androididmd5", StringType),
      StructField("imeisha1", StringType),
      StructField("macsha1", StringType),
      StructField("idfasha1", StringType),
      StructField("openudidsha1", StringType),
      StructField("androididsha1", StringType),
      StructField("uuidunknow", StringType),
      StructField("userid", StringType),
      StructField("iptype", IntegerType),
      StructField("initbidprice", DoubleType),
      StructField("adpayment", DoubleType),
      StructField("agentrate", DoubleType),
      StructField("lrate", DoubleType),
      StructField("adxrate", DoubleType),
      StructField("title", StringType),
      StructField("keywords", StringType),
      StructField("tagid", StringType),
      StructField("callbackdate", StringType),
      StructField("channelid", StringType),
      StructField("mediatype", IntegerType)
    ))


    /**
     * RDD[Row] <- RDD[Array[String]]
     * schema: StructType <- demo
     */
    val dataFrame = sqlc.createDataFrame(rowRDD, schema)

    // dataFrame -> parquet
    // parquet输出的时候默认采用的gz压缩格式
    dataFrame.write.parquet(parquetOutPath)

    sc.stop()
  }
}

dolphin-doit01\src\main\scala\cn\sheep\dolphin\utils\NBFormat.scala

 

package cn.sheep.dolphin.utils

import org.apache.commons.lang.StringUtils
/** 字符串(数字)的格式化操作
 * author: old sheep
 * Created 2021/3/21  11:46
 */
object NBFormat {


  def apply(str: String) = {
    try {
      if (StringUtils.isNotEmpty(str)) {
        str.trim.toInt
      } else 0
    } catch {
      case _: Exception => 0
    }
  }

}

dolphin-doit01\src\main\scala\cn\sheep\dolphin\bean\RichString.scala

package cn.sheep.dolphin.bean

/**
 * author: old sheep
 * Created 2021/03/21
 */
class RichString(val str: String) {

  def toIntPlus = try {
    str.toInt
  } catch {
    case _: Exception => 0
  }

  def toDoublePlus = try {
    str.toDouble
  } catch {
    case _: Exception => 0d
  }
}

object RichString {
  /**
   * 将string 隐式转换成richString
   * @param str
   * @return
   */
  implicit def str2RichString(str: String) = new RichString(str)
}

配置参数输出参数

运行Bz2Parquet程序,控制台打印输出

{
  "type" : "struct",
  "fields" : [ {
    "name" : "sessionid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "advertisersid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adorderid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adcreativeid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adplatformproviderid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "sdkversion",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adplatformkey",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "putinmodeltype",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "requestmode",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adprice",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adppprice",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "requestdate",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "ip",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "appid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "appname",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "uuid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "device",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "client",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "osversion",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "density",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "pw",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "ph",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "long",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "lat",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "provincename",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "cityname",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "ispid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "ispname",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "networkmannerid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "networkmannername",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "iseffective",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "isbilling",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adspacetype",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adspacetypename",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "devicetype",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "processnode",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "apptype",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "district",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "paymode",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "isbid",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "bidprice",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "winprice",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "iswin",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "cur",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "rate",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "cnywinprice",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "imei",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "mac",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "idfa",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "openudid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "androidid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "rtbprovince",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "rtbcity",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "rtbdistrict",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "rtbstreet",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "storeurl",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "realip",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "isqualityapp",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "bidfloor",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "aw",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "ah",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "imeimd5",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "macmd5",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "idfamd5",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "openudidmd5",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "androididmd5",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "imeisha1",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "macsha1",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "idfasha1",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "openudidsha1",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "androididsha1",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "uuidunknow",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "userid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "iptype",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "initbidprice",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adpayment",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "agentrate",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "lrate",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "adxrate",
    "type" : "double",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "title",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "keywords",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "tagid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "callbackdate",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "channelid",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "mediatype",
    "type" : "integer",
    "nullable" : true,
    "metadata" : { }
  } ]
}

在输出文件路径查看

 

标签:arr,bz2,nullable,true,name,parquet,type,精准,metadata
来源: https://blog.csdn.net/weixin_39868387/article/details/118270997

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有