ICode9

精准搜索请尝试: 精确搜索
首页 > 数据库> 文章详细

使用DataX实现mysql数据迁移

2021-02-04 22:33:41  阅读:378  来源: 互联网

标签:02 INFO 04 job 2021 mysql 迁移 0.000 DataX


DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、DRDS 等各种异构数据源之间高效的数据同步功能

github地址 https://github.com/alibaba/DataX

在github上找到 Quick Start 部分

https://github.com/alibaba/DataX/blob/master/userGuid.md

1.安装

  JDK 1.8.0_151

  Python 2.7.18

  下载后解压

2.启动

  进入bin目录

python datax.py {YOUR_JOB.json}

3.简单输出

(1)创建作业的配置文件(json格式)

  stream.json

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "sliceRecordCount": 2,
            "column": [
              {
                "type": "long",
                "value": "2"
              },
              {
                "type": "string",
                "value": "hello,world-DataX"
              }
            ]
          }
        },
        "writer": {
          "name": "streamwriter",
          "parameter": {
            "encoding": "UTF-8",
            "print": true
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 3
       }
    }
  }
}

(2)启动DataX

python datax.py ./stream.json

输出

……
2021-02-04 16:58:42.891 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2    hello,world-DataX
2    hello,world-DataX
2021-02-04 16:58:42.893 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] attemptCount[1] is started
2    hello,world-DataX
2    hello,world-DataX
2    hello,world-DataX
2    hello,world-DataX
2021-02-04 16:58:42.993 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[102]ms
2021-02-04 16:58:42.993 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] is successed, used[105]ms
2021-02-04 16:58:42.993 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] is successed, used[101]ms
2021-02-04 16:58:42.994 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2021-02-04 16:58:52.887 [job-0] INFO  StandAloneJobContainerCommunicator - Total 6 records, 108 bytes | Speed 10B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2021-02-04 16:58:52.888 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2021-02-04 16:58:52.889 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2021-02-04 16:58:52.889 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2021-02-04 16:58:52.889 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2021-02-04 16:58:52.891 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: D:\data\datax\hook
2021-02-04 16:58:52.893 [job-0] INFO  JobContainer - 
     [total cpu info] => 
        averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
        -1.00%                         | -1.00%                         | -1.00%
                        

     [total gc info] => 
         NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
         PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
         PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2021-02-04 16:58:52.894 [job-0] INFO  JobContainer - PerfTrace not enable!
2021-02-04 16:58:52.894 [job-0] INFO  StandAloneJobContainerCommunicator - Total 6 records, 108 bytes | Speed 10B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2021-02-04 16:58:52.896 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2021-02-04 16:58:42
任务结束时刻                    : 2021-02-04 16:58:52
任务总计耗时                    :                 10s
任务平均流量                    :               10B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   6
读写失败总数                    :                   0

4.获取mysql数据本地打印

mysqltest.json

{
    "job": {
        "setting": {
            "speed": {
                 "channel":1
            }
        },
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "root",
                        "connection": [
                            {
                                "querySql": [
                                    "select id,name from sys_user limit 3;"
                                ],
                                "jdbcUrl": [
                                    "jdbc:mysql://localhost:3306/dream"
                                ]
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "streamwriter",
                    "parameter": {
                        "print": true,
                        "encoding": "UTF-8"
                    }
                }
            }
        ]
    }
}

输出

……
2021-02-04 17:00:24.237 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from sys_user limit 3;
] jdbcUrl:[jdbc:mysql://localhost:3306/dream?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
33    abc
1    admin
25    caocao
2021-02-04 17:00:24.291 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[102]ms
2021-02-04 17:00:24.291 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2021-02-04 17:00:34.187 [job-0] INFO  StandAloneJobContainerCommunicator - Total 3 records, 19 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2021-02-04 17:00:34.188 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2021-02-04 17:00:34.189 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2021-02-04 17:00:34.189 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do post work.
2021-02-04 17:00:34.189 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2021-02-04 17:00:34.191 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: D:\data\datax\hook
2021-02-04 17:00:34.194 [job-0] INFO  JobContainer - 
     [total cpu info] => 
        averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
        -1.00%                         | -1.00%                         | -1.00%
                        

     [total gc info] => 
         NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
         PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
         PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2021-02-04 17:00:34.194 [job-0] INFO  JobContainer - PerfTrace not enable!
2021-02-04 17:00:34.195 [job-0] INFO  StandAloneJobContainerCommunicator - Total 3 records, 19 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2021-02-04 17:00:34.196 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2021-02-04 17:00:23
任务结束时刻                    : 2021-02-04 17:00:34
任务总计耗时                    :                 10s
任务平均流量                    :                1B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   3
读写失败总数                    :                   0

5.mysql 不同库表数据迁移

migration.json

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "column": ["id","name"],
                        "connection": [
                            {
                              "jdbcUrl": [ "jdbc:mysql://127.0.0.1:3306/dream" ],
                                "table": ["sys_user"]
                            }
                        ],
                        "password": "root",
                        "username": "root",
                        "where": "id>29"
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "column": ["id","name"],
                        "connection": [
                          {
                            "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/dream1",
                            "table": ["user"]
                          }
                        ],
                        "password": "root",
                        "username": "root",
                        "preSql": [],
                        "session": [],
                        "writeMode": "insert"
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": "1"
            }
        }
    }
}

输出

……
2021-02-04 17:10:17.061 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4 records, 30 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2021-02-04 17:10:17.061 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2021-02-04 17:10:06
任务结束时刻                    : 2021-02-04 17:10:17
任务总计耗时                    :                 10s
任务平均流量                    :                3B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   4
读写失败总数                    :                   0

查看 dream1的 user数据已经插入

其他参数可以查看github上mysqlreader和mysqlwriter下的doc

 

标签:02,INFO,04,job,2021,mysql,迁移,0.000,DataX
来源: https://www.cnblogs.com/baby123/p/14372147.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有