
1 方案概述
2 前提条件
3 Redshift导出数据到S3
3.1 Unload简介
3.1.1 命令介绍
unload ('select * from venue')to 's3://mybucket/tickit/unload/venue_'iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';
unload ('select * from venue')to 's3://mybucket/tickit/venue_'access_key_id ''secret_access_key ''session_token '';
3.1.2 默认格式导出
unload默认导出数据的格式为 | 符合分隔的文本文件,命令如下:unload ('select * from customer')to 's3://xxx-bucket/unload_from_redshift/customer/customer_'iam_role 'arn:aws:iam::xxx:role/redshift_s3_role';
3.1.3 Parquet格式导出
unload ('select * from customer')to 's3://xxx-bucket/unload_from_redshift/customer_parquet/customer_'FORMAT AS PARQUETiam_role 'arn:aws:iam::xxxx:role/redshift_s3_role';

3.2 创建可以读写S3的IAM 角色
3.2.1 新建Redshift use case的IAM角色
3.2.2 添加读写S3的权限策略
3.2.3为IAM Role命名并完成IAM 角色创建

3.2.4Redshift集群添加IAM Role以获取访问S3权限
3.3 卸载数据到Amazon S3
`unload ('select * from customer') to 's3://xxx-bucket/unload_from_redshift/customer_parquet/customer_' FORMAT AS PARQUET iam_role 'arn:aws:iam::xxx:role/redshift_s3_role';``unload ('select * from orders') to 's3://xxx-bucket/unload_from_redshift/orders_parquet/orders_' FORMAT AS PARQUET iam_role 'arn:aws:iam::xxx:role/redshift_s3_role';``unload ('select * from lineitem') to 's3://xxx-bucket/unload_from_redshift/lineitem_parquet/lineitem_' FORMAT AS PARQUET iam_role 'arn:aws:iam::xxx:role/redshift_s3_role';``unload ('select * from nation') to 's3://xxx-bucket/unload_from_redshift/nation_parquet/nation_' FORMAT AS PARQUET iam_role 'arn:aws:iam::xxx:role/redshift_s3_role';``unload ('select * from part') to 's3://xxx-bucket/unload_from_redshift/part_parquet/part_' FORMAT AS PARQUET iam_role 'arn:aws:iam::xxx:role/redshift_s3_role';``unload ('select * from partsupp') to 's3://xxx-bucket/unload_from_redshift/partsupp_parquet/partsupp_' FORMAT AS PARQUET iam_role 'arn:aws:iam::xxx:role/redshift_s3_role';``unload ('select * from region') to 's3://xxx-bucket/unload_from_redshift/region_parquet/region_' FORMAT AS PARQUET iam_role 'arn:aws:iam::xxx:role/redshift_s3_role';``unload ('select * from supplier') to 's3://xxx-bucket/unload_from_redshift/supplier_parquet/supplier_' FORMAT AS PARQUET iam_role 'arn:aws:iam::xxx:role/redshift_s3_role';`
4 迁移AWS S3数据到阿里云OSS
4.1 在线迁移服务实现S3到OSS迁移
4.1.1 阿里云对象存储OSS准备
-
登录RAM 控制台。 -
在左侧导航栏,单击人员管理 > 用户 > 创建用户。 -
选中控制台密码登录和编程访问,之后填写用户账号信息。 -
保存生成的账号、密码、AccessKeyID 和 AccessKeySecret。 -
选中用户登录名称,单击添加权限,授予子账号存储空间读写权限(AliyunOSSFullAccess)和在线迁移管理权限(AliyunMGWFullAccess)。 -
授权完成后,在左侧导航栏,单击概览 > 用户登录地址链接,使用刚创建的 RAM 子账号的用户名和密码进行登录。
4.1.2 AWS侧准备可编程访问S3的IAM账号
-
预估迁移数据,进入管控台中确认S3中有的存储量与文件数量。 -
创建编程访问S3的IAM账号
4.1.3 创建在线迁移任务
4.2 使用在线迁移服务迁移数据
4.2.1定义源数据地址和目标数据地址
-
数据类型选择:AWS-S3 -
Data Name:填写数据源的别名 -
Endpoint:填写S3的Endpoint(Endpoint选择参考) -
Bucket:填写从Redshift unload到S3所在的bucket名称 -
Prefix:本方案中将所有Redshift数据导出到同一个目录unload_from_redshift下,填写该目录,迁移该目录下所有数据 -
Access Key Id:填写具备访问AWS S3该Bucket目录权限的账号的Access Key Id(使用前文提到的AWS侧可编程访问的IAM账号身份) -
Secret Access Key: 填写具备访问AWS S3该Bucket目录权限的账号的Secret Access Key(使用前文提到的AWS侧可编程访问的IAM账号身份)
-
数据类型选择:OSS -
Data Name:填写数据源的别名 -
Endpoint:填写oss的Endpoint -
Bucket:填写迁移目标的OSS的bucket名称 -
Prefix:填写迁移目标bucket,本方案将迁移数据写入unload_from_redshift下 -
Access Key Id:填写具备访问OSS该Bucket目录权限的账号的Access Key Id(使用前文提到的阿里云对象存储OSS准备章节中的账号身份) -
Secret Access Key: 填写具备访问OSS该Bucket目录权限的账号的Secret Access Key(使用前文提到的阿里云对象存储OSS准备章节中的账号身份)
4.2.2创建迁移任务
-
源数据地址填写已经定义的s3数据源; -
目标地址填写已经定义的oss数据源; -
本次选择全量迁移的迁移类型;
5 MaxCompute直接加载OSS数据
5.1 创建MaxCompute Table
--MaxCompute DDLCREATE TABLE customer(C_CustKey int ,C_Name varchar(64) ,C_Address varchar(64) ,C_NationKey int ,C_Phone varchar(64) ,C_AcctBal decimal(13, 2) ,C_MktSegment varchar(64) ,C_Comment varchar(120) ,skip varchar(64));CREATE TABLE lineitem(L_OrderKey int ,L_PartKey int ,L_SuppKey int ,L_LineNumber int ,L_Quantity int ,L_ExtendedPrice decimal(13, 2) ,L_Discount decimal(13, 2) ,L_Tax decimal(13, 2) ,L_ReturnFlag varchar(64) ,L_LineStatus varchar(64) ,L_ShipDate timestamp ,L_CommitDate timestamp ,L_ReceiptDate timestamp ,L_ShipInstruct varchar(64) ,L_ShipMode varchar(64) ,L_Comment varchar(64) ,skip varchar(64));CREATE TABLE nation(N_NationKey int ,N_Name varchar(64) ,N_RegionKey int ,N_Comment varchar(160) ,skip varchar(64));CREATE TABLE orders(O_OrderKey int ,O_CustKey int ,O_OrderStatus varchar(64) ,O_TotalPrice decimal(13, 2) ,O_OrderDate timestamp ,O_OrderPriority varchar(15) ,O_Clerk varchar(64) ,O_ShipPriority int ,O_Comment varchar(80) ,skip varchar(64));CREATE TABLE part(P_PartKey int ,P_Name varchar(64) ,P_Mfgr varchar(64) ,P_Brand varchar(64) ,P_Type varchar(64) ,P_Size int ,P_Container varchar(64) ,P_RetailPrice decimal(13, 2) ,P_Comment varchar(64) ,skip varchar(64));CREATE TABLE partsupp(PS_PartKey int ,PS_SuppKey int ,PS_AvailQty int ,PS_SupplyCost decimal(13, 2) ,PS_Comment varchar(200) ,skip varchar(64));CREATE TABLE region(R_RegionKey int ,R_Name varchar(64) ,R_Comment varchar(160) ,skip varchar(64));CREATE TABLE supplier(S_SuppKey int ,S_Name varchar(64) ,S_Address varchar(64) ,S_NationKey int ,S_Phone varchar(18) ,S_AcctBal decimal(13, 2) ,S_Comment varchar(105) ,skip varchar(64));
5.2 LOAD命令加载OSS数据到MaxCompute表
5.2.1 创建具备访问OSS权限的Ram Role
LOAD overwrite table ordersfrom LOCATION 'oss://:@oss-ap-southeast-1-internal.aliyuncs.com/sg-migration/unload_from_redshift/orders_parquet/'ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'STORED AS PARQUET;
5.2.2 通过LOAD命令加载数据
LOAD overwrite table orders`from LOCATION 'oss://endpoint/bucket/unload_from_redshift/orders_parquet/'ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'WITH SERDEPROPERTIES ('odps.properties.rolearn'='acs:ram::xxx:role/xxx_role')STORED AS PARQUET;
6 数据核完整性与正确性核对
6.1 在Redshift集群中运行查询作业
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty,sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1-l_discount)) as sum_disc_price,sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty,avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_orderfrom lineitemgroup by l_returnflag, l_linestatusorder by l_returnflag,l_linestatus;
6.2 在MaxCompute中运行相同的查询结果
更多精彩
云原生五大趋势预测,K8s 安卓化位列其一
饿了么4年 + 阿里2年研发路上的总结与思考
优秀技术人,如何做到高效沟通?
点此阅读作者更多好文!




