ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

ORC File

2021-12-06 14:03:40  阅读:237  来源: 互联网

标签:decimal sales sk catalog bigint File cs ORC


ORC file can reduce the data size read from HDFS.

The size of catalog_sales at orc format is 151644639.

hive> SHOW CREATE TABLE tpcds_bin_partitioned_orc_2.catalog_sales;
OK
CREATE TABLE `tpcds_bin_partitioned_orc_2.catalog_sales`(
  `cs_sold_time_sk` bigint, 
  `cs_ship_date_sk` bigint, 
  `cs_bill_customer_sk` bigint, 
  `cs_bill_cdemo_sk` bigint, 
  `cs_bill_hdemo_sk` bigint, 
  `cs_bill_addr_sk` bigint, 
  `cs_ship_customer_sk` bigint, 
  `cs_ship_cdemo_sk` bigint, 
  `cs_ship_hdemo_sk` bigint, 
  `cs_ship_addr_sk` bigint, 
  `cs_call_center_sk` bigint, 
  `cs_catalog_page_sk` bigint, 
  `cs_ship_mode_sk` bigint, 
  `cs_warehouse_sk` bigint, 
  `cs_item_sk` bigint, 
  `cs_promo_sk` bigint, 
  `cs_order_number` bigint, 
  `cs_quantity` int, 
  `cs_wholesale_cost` decimal(7,2), 
  `cs_list_price` decimal(7,2), 
  `cs_sales_price` decimal(7,2), 
  `cs_ext_discount_amt` decimal(7,2), 
  `cs_ext_sales_price` decimal(7,2), 
  `cs_ext_wholesale_cost` decimal(7,2), 
  `cs_ext_list_price` decimal(7,2), 
  `cs_ext_tax` decimal(7,2), 
  `cs_coupon_amt` decimal(7,2), 
  `cs_ext_ship_cost` decimal(7,2), 
  `cs_net_paid` decimal(7,2), 
  `cs_net_paid_inc_tax` decimal(7,2), 
  `cs_net_paid_inc_ship` decimal(7,2), 
  `cs_net_paid_inc_ship_tax` decimal(7,2), 
  `cs_net_profit` decimal(7,2))
PARTITIONED BY ( 
  `cs_sold_date_sk` bigint)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://localhost:9000/user/hive/warehouse/tpcds_bin_partitioned_orc_2.db/catalog_sales'
TBLPROPERTIES (
  'bucketing_version'='2', 
  'transient_lastDdlTime'='1628754485')
Time taken: 0.051 seconds, Fetched: 47 row(s)

hive> dfs -du -s hdfs://localhost:9000/user/hive/warehouse/tpcds_bin_partitioned_orc_2.db/catalog_sales;
151644639  151644639  hdfs://localhost:9000/user/hive/warehouse/tpcds_bin_partitioned_orc_2.db/catalog_sales

the size read from HDFS is much smaller than the data size

set hive.compute.query.using.stats=false;
select count(1) from tpcds_bin_partitioned_orc_2.catalog_sales;
  VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
----------------------------------------------------------------------------------------------
     Map 1          10082.00         38,630            868       2,880,058                4
 Reducer 2              0.00            460              0               4                0

TEZ Counters: HDFS_BYTES_READ 30863152

Parquet file

create table parquet.catalog_sales stored as parquet as select * from tpcds_bin_partitioned_orc_2.catalog_sales;
select count(1) from parquet.catalog_sales;
Task Execution Summary
----------------------------------------------------------------------------------------------
  VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
----------------------------------------------------------------------------------------------
     Map 1           4020.00         12,620            601       2,880,058               12
 Reducer 2             87.00            680              0              12                0

hive> dfs -du -s hdfs://localhost:9000/user/hive/warehouse/parquet.db/catalog_sales;
243755174  243755174  hdfs://localhost:9000/user/hive/warehouse/parquet.db/catalog_sales

TEZ Counters: HDFS_BYTES_READ 243795493

The HDFS_BYTES_READ indicates that

标签:decimal,sales,sk,catalog,bigint,File,cs,ORC
来源: https://blog.csdn.net/houzhizhen/article/details/121743808

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有