一、为什么DWS层需要宽表建模?
在数据仓库中,DWS层(Data Warehouse Service) 是连接明细数据(DWD层)与上层应用(ADS层)的核心枢纽。它的核心目标是:
轻度聚合:对DWD层的明细数据进行按主题的聚合(如按天、按区域、按商品类目)。
服务化接口:为报表、BI工具、实时看板等提供标准化的宽表数据。
性能优化:通过预计算和索引设计,减少上层应用的查询复杂度。
宽表建模 是DWS层的核心手段,其本质是将多个维度表和事实表的字段整合到一张表中,形成"即插即用"的数据服务层。
二、10个经典宽表建模场景与完整代码
场景:电商/零售企业需要分析商品销售趋势、库存周转率、复购率等指标。
1)核心字段
|
|
|
|
|
|---|---|---|---|
goods_id |
|
|
|
category_id |
|
|
|
category_name |
|
|
|
brand_id |
|
|
|
brand_name |
|
|
|
price |
|
|
|
sale_count |
|
|
|
sale_amount |
|
|
|
return_rate |
|
|
|
inventory |
|
|
|
inventory_turnover_days |
|
|
|
repurchase_rate |
|
|
|
update_time |
|
|
|
2)建表SQL
CREATE TABLE dws_goods_detail (goods_id VARCHAR(32) COMMENT'商品唯一ID',category_id VARCHAR(16) COMMENT'商品类目ID',category_name VARCHAR(64) COMMENT'商品类目名称',brand_id VARCHAR(16) COMMENT'品牌ID',brand_name VARCHAR(32) COMMENT'品牌名称',price DECIMAL(10,2) COMMENT'商品价格',sale_count INTCOMMENT'销售数量(日)',sale_amount DECIMAL(12,2) COMMENT'销售金额(日)',return_rate DECIMAL(5,2) COMMENT'退货率(%)',inventory INTCOMMENT'当前库存量',inventory_turnover_days DECIMAL(5,2) COMMENT'库存周转天数',repurchase_rate DECIMAL(5,2) COMMENT'复购率(%)',update_time DATETIME COMMENT'数据更新时间',PRIMARY KEY (goods_id, update_time)) COMMENT'商品主题宽表'DISTRIBUTEDBYHASH(goods_id) BUCKETS 32PROPERTIES ("replication_num" = "1");
3)开发SQL代码
INSERT INTO dws_goods_detailSELECTg.goods_id,g.category_id,c.category_name,g.brand_id,b.brand_name,g.price,SUM(o.quantity) AS sale_count,SUM(o.quantity * o.price) AS sale_amount,ROUND(SUM(o.return_quantity) * 100.0 / NULLIF(SUM(o.quantity), 0), 2) AS return_rate,i.inventory,ROUND(30.0 * SUM(o.quantity) / NULLIF(SUM(o.quantity), 0), 2) AS inventory_turnover_days,ROUND(COUNT(DISTINCT o.user_id) * 100.0 / NULLIF(COUNT(DISTINCT o.user_id), 0), 2) AS repurchase_rate,NOW() AS update_timeFROM dwd_order_detail oJOIN dwd_goods_info g ON o.goods_id = g.goods_idJOIN dwd_category_info c ON g.category_id = c.category_idJOIN dwd_brand_info b ON g.brand_id = b.brand_idJOIN dwd_inventory i ON g.goods_id = i.goods_idWHERE o.order_date = CURDATE()GROUPBY g.goods_id, g.category_id, c.category_name, g.brand_id, b.brand_name, g.price, i.inventory;
场景:用户画像分析、会员等级管理、营销效果评估。
1)核心字段
|
|
|
|
|
|---|---|---|---|
user_id |
|
|
|
register_channel |
|
|
|
member_level |
|
|
|
region_id |
|
|
|
region_name |
|
|
|
device_type |
|
|
|
active_days |
|
|
|
order_count |
|
|
|
avg_order_amount |
|
|
|
lifecycle_value |
|
|
|
last_order_date |
|
|
|
update_time |
|
|
|
2)建表SQL
CREATE TABLE dws_user_detail (user_id VARCHAR(32) COMMENT'用户唯一ID',register_channel VARCHAR(32) COMMENT'注册渠道',member_level VARCHAR(16) COMMENT'会员等级',region_id VARCHAR(16) COMMENT'地域ID',region_name VARCHAR(64) COMMENT'地域名称',device_type VARCHAR(16) COMMENT'设备类型',active_days INTCOMMENT'活跃天数(30天)',order_count INTCOMMENT'下单次数(30天)',avg_order_amount DECIMAL(10,2) COMMENT'平均客单价',lifecycle_value DECIMAL(12,2) COMMENT'生命周期价值',last_order_date DATECOMMENT'最后下单日期',update_time DATETIME COMMENT'数据更新时间',PRIMARY KEY (user_id)) COMMENT'用户主题宽表'DISTRIBUTEDBYHASH(user_id) BUCKETS 32PROPERTIES ("replication_num" = "1");
3)数据SQL
INSERT INTO dws_user_detailSELECTu.user_id,u.register_channel,u.member_level,u.region_id,r.region_name,u.device_type,COUNT(DISTINCT o.order_date) AS active_days,COUNT(o.order_id) AS order_count,AVG(o.amount) AS avg_order_amount,SUM(o.amount) AS lifecycle_value,MAX(o.order_date) AS last_order_date,NOW() AS update_timeFROM dwd_user_info uJOIN dwd_region_info r ON u.region_id = r.region_idJOIN dwd_order_detail o ON u.user_id = o.user_idWHERE o.order_date >= DATE_SUB(CURRENT_DATE, 30)GROUPBY u.user_id, u.register_channel, u.member_level, u.region_id, r.region_name, u.device_type;
场景:订单履约分析、退款率监控、供应链优化。
1)核心字段设计
|
|
|
|
|
|---|---|---|---|
order_id |
|
|
|
order_date |
|
|
|
pay_method |
|
|
|
delivery_channel |
|
|
|
order_status |
|
|
|
amount |
|
|
|
refund_amount |
|
|
|
refund_rate |
|
|
|
delivery_time |
|
|
|
delivery_cost |
|
|
|
update_time |
|
|
|
2)建表SQL
CREATE TABLE dws_order_detail (order_id VARCHAR(32) COMMENT'订单ID',order_date DATECOMMENT'订单日期',pay_method VARCHAR(16) COMMENT'支付方式',delivery_channel VARCHAR(32) COMMENT'配送渠道',order_status VARCHAR(16) COMMENT'订单状态',amount DECIMAL(12,2) COMMENT'订单金额',refund_amount DECIMAL(12,2) COMMENT'退款金额',refund_rate DECIMAL(5,2) COMMENT'退款率(%)',delivery_time INTCOMMENT'配送时长(小时)',delivery_cost DECIMAL(8,2) COMMENT'物流成本',update_time DATETIME COMMENT'数据更新时间',PRIMARY KEY (order_id)) COMMENT'订单主题宽表'DISTRIBUTEDBYHASH(order_id) BUCKETS 32PROPERTIES ("replication_num" = "1");
3)数据SQL示例
INSERT INTO dws_order_detailSELECTo.order_id,o.order_date,o.pay_method,o.delivery_channel,o.order_status,o.amount,COALESCE(r.refund_amount, 0) AS refund_amount,ROUND(COALESCE(r.refund_amount, 0) * 100.0 / NULLIF(o.amount, 0), 2) AS refund_rate,DATEDIFF(o.delivery_time, o.order_time) * 24 + EXTRACT(HOURFROM (o.delivery_time - o.order_time)) AS delivery_time,d.delivery_cost,NOW() AS update_timeFROM dwd_order_fact oLEFTJOIN dwd_refund_info r ON o.order_id = r.order_idLEFTJOIN dwd_delivery_info d ON o.order_id = d.order_idWHERE o.order_date = CURDATE();
场景:区域市场分析、门店选址评估、区域库存分配。
1)核心字段设计
|
|
|
|
|
|---|---|---|---|
region_id |
|
|
|
province |
|
|
|
city |
|
|
|
district |
|
|
|
store_id |
|
|
|
sales_amount |
|
|
|
customer_count |
|
|
|
conversion_rate |
|
|
|
inventory_turnover |
|
|
|
avg_order_value |
|
|
|
update_time |
|
|
|
2)建表SQL
CREATE TABLE dws_region_detail (region_id VARCHAR(16) COMMENT'地域ID',province VARCHAR(32) COMMENT'省份',city VARCHAR(32) COMMENT'城市',district VARCHAR(32) COMMENT'区域',store_id VARCHAR(16) COMMENT'门店ID',sales_amount DECIMAL(12,2) COMMENT'销售额(日)',customer_count INTCOMMENT'客户数(日)',conversion_rate DECIMAL(5,2) COMMENT'转化率(%)',inventory_turnover DECIMAL(5,2) COMMENT'库存周转率',avg_order_value DECIMAL(10,2) COMMENT'平均订单价值',update_time DATETIME COMMENT'数据更新时间',PRIMARY KEY (region_id, store_id)) COMMENT'地区主题宽表'DISTRIBUTEDBYHASH(region_id) BUCKETS 32PROPERTIES ("replication_num" = "1");
3)开发SQL示例
INSERT INTO dws_region_detailSELECTr.region_id,r.province,r.city,r.district,s.store_id,SUM(o.amount) AS sales_amount,COUNT(DISTINCT o.user_id) AS customer_count,ROUND(COUNT(DISTINCT o.user_id) * 100.0 / NULLIF(COUNT(o.order_id), 0), 2) AS conversion_rate,ROUND(SUM(o.quantity) * 30.0 / NULLIF(SUM(i.inventory), 0), 2) AS inventory_turnover,AVG(o.amount) AS avg_order_value,NOW() AS update_timeFROM dwd_order_detail oJOIN dwd_region_info r ON o.region_id = r.region_idJOIN dwd_store_info s ON r.region_id = s.region_idJOIN dwd_inventory i ON o.goods_id = i.goods_idWHERE o.order_date = CURDATE()GROUPBY r.region_id, r.province, r.city, r.district, s.store_id;
场景:营销渠道效果分析、ROI计算、投放策略优化。
1)核心字段设计
|
|
|
|
|
|---|---|---|---|
channel_id |
|
|
|
channel_name |
|
|
|
campaign_id |
|
|
|
campaign_name |
|
|
|
impressions |
|
|
|
clicks |
|
|
|
click_rate |
|
|
|
conversions |
|
|
|
conversion_rate |
|
|
|
cac |
|
|
|
update_time |
|
|
|
2)建表SQL
CREATE TABLE dws_channel_detail (channel_id VARCHAR(16) COMMENT'渠道ID',channel_name VARCHAR(32) COMMENT'渠道名称',campaign_id VARCHAR(16) COMMENT'活动ID',campaign_name VARCHAR(64) COMMENT'活动名称',impressions BIGINTCOMMENT'曝光量',clicks BIGINTCOMMENT'点击量',click_rate DECIMAL(5,2) COMMENT'点击率(%)',conversions INTCOMMENT'转化量',conversion_rate DECIMAL(5,2) COMMENT'转化率(%)',cac DECIMAL(10,2) COMMENT'获客成本',update_time DATETIME COMMENT'数据更新时间',PRIMARY KEY (channel_id, campaign_id)) COMMENT'渠道主题宽表'DISTRIBUTEDBYHASH(channel_id) BUCKETS 32PROPERTIES ("replication_num" = "1");
3)开发SQL代码
INSERT INTO dws_channel_detailSELECTc.channel_id,c.channel_name,cam.campaign_id,cam.campaign_name,SUM(a.impressions) AS impressions,SUM(a.clicks) AS clicks,ROUND(SUM(a.clicks) * 100.0 / NULLIF(SUM(a.impressions), 0), 2) AS click_rate,COUNT(DISTINCT o.order_id) AS conversions,ROUND(COUNT(DISTINCT o.order_id) * 100.0 / NULLIF(SUM(a.clicks), 0), 2) AS conversion_rate,ROUND(SUM(a.cost) / NULLIF(COUNT(DISTINCT o.user_id), 0), 2) AS cac,NOW() AS update_timeFROM dwd_ad_activity aJOIN dwd_channel_info c ON a.channel_id = c.channel_idJOIN dwd_campaign_info cam ON a.campaign_id = cam.campaign_idLEFTJOIN dwd_order_detail o ON a.user_id = o.user_idWHERE a.date = CURDATE()GROUPBY c.channel_id, c.channel_name, cam.campaign_id, cam.campaign_name;
场景:供应商绩效评估、采购成本分析、库存周转监控。
1)核心字段设计
|
|
|
|
|
|---|---|---|---|
supplier_id |
|
|
|
supplier_name |
|
|
|
category_id |
|
|
|
category_name |
|
|
|
purchase_amount |
|
|
|
delivery_on_time_rate |
|
|
|
inventory_turnover |
|
|
|
purchase_cost |
|
|
|
lead_time |
|
|
|
update_time |
|
|
|
2)建表SQL
CREATE TABLE dws_supplier_detail (supplier_id VARCHAR(16) COMMENT'供应商ID',supplier_name VARCHAR(64) COMMENT'供应商名称',category_id VARCHAR(16) COMMENT'采购品类ID',category_name VARCHAR(64) COMMENT'采购品类名称',purchase_amount DECIMAL(12,2) COMMENT'采购金额(月)',delivery_on_time_rate DECIMAL(5,2) COMMENT'交货准时率(%)',inventory_turnover DECIMAL(5,2) COMMENT'库存周转率',purchase_cost DECIMAL(10,2) COMMENT'采购成本',lead_time INTCOMMENT'采购提前期(天)',update_time DATETIME COMMENT'数据更新时间',PRIMARY KEY (supplier_id, category_id)) COMMENT'供应链主题宽表'DISTRIBUTEDBYHASH(supplier_id) BUCKETS 32PROPERTIES ("replication_num" = "1");
3)开发SQL示例
INSERT INTO dws_supplier_detailSELECTs.supplier_id,s.supplier_name,p.category_id,c.category_name,SUM(p.amount) AS purchase_amount,ROUND(SUM(CASEWHEN p.delivery_date <= p.expected_delivery_date THEN1ELSE0END) * 100.0 / NULLIF(COUNT(p.purchase_id), 0), 2) AS delivery_on_time_rate,ROUND(SUM(p.quantity) * 30.0 / NULLIF(SUM(i.inventory), 0), 2) AS inventory_turnover,AVG(p.unit_price) AS purchase_cost,AVG(DATEDIFF(p.delivery_date, p.purchase_date)) AS lead_time,NOW() AS update_timeFROM dwd_purchase_order pJOIN dwd_supplier_info s ON p.supplier_id = s.supplier_idJOIN dwd_category_info c ON p.category_id = c.category_idJOIN dwd_inventory i ON p.goods_id = i.goods_idWHERE p.purchase_date >= DATE_SUB(CURRENT_DATE, 30)GROUPBY s.supplier_id, s.supplier_name, p.category_id, c.category_name;
场景:客服响应率分析、投诉率监控、客户满意度评估。
1)核心字段设计
|
|
|
|
|
|---|---|---|---|
service_id |
|
|
|
service_channel |
|
|
|
service_type |
|
|
|
agent_id |
|
|
|
response_time |
|
|
|
resolution_rate |
|
|
|
complaint_count |
|
|
|
satisfaction_score |
|
|
|
update_time |
|
|
|
2)建表SQL
CREATE TABLE dws_service_detail (service_id VARCHAR(32) COMMENT'服务ID',service_channel VARCHAR(16) COMMENT'服务渠道',service_type VARCHAR(32) COMMENT'问题类型',agent_id VARCHAR(16) COMMENT'客服ID',response_time INTCOMMENT'响应时长(分钟)',resolution_rate DECIMAL(5,2) COMMENT'解决率(%)',complaint_count INTCOMMENT'投诉次数',satisfaction_score DECIMAL(5,2) COMMENT'满意度评分',update_time DATETIME COMMENT'数据更新时间',PRIMARY KEY (service_id)) COMMENT'客户服务主题宽表'DISTRIBUTEDBYHASH(service_id) BUCKETS 32PROPERTIES ("replication_num" = "1");
3)SQL代码
INSERT INTO dws_service_detailSELECTs.service_id,s.service_channel,s.service_type,s.agent_id,AVG(TIMESTAMPDIFF(MINUTE, s.create_time, s.response_time)) AS response_time,ROUND(SUM(CASEWHEN s.status = '已解决'THEN1ELSE0END) * 100.0 / NULLIF(COUNT(*), 0), 2) AS resolution_rate,COUNT(CASEWHEN s.complaint = 1THEN1END) AS complaint_count,AVG(s.satisfaction_score) AS satisfaction_score,NOW() AS update_timeFROM dwd_service_record sWHERE s.create_time >= DATE_SUB(CURRENT_DATE, 7)GROUPBY s.service_id, s.service_channel, s.service_type, s.agent_id;
场景:成本核算、利润分析、现金流监控。
1)核心字段设计
|
|
|
|
|
|---|---|---|---|
finance_id |
|
|
|
department |
|
|
|
project_id |
|
|
|
project_name |
|
|
|
revenue |
|
|
|
cost |
|
|
|
gross_profit |
|
|
|
gross_profit_rate |
|
|
|
cash_flow |
|
|
|
update_time |
|
|
|
2)建表SQL
CREATE TABLE dws_finance_detail (finance_id VARCHAR(16) COMMENT'财务ID',department VARCHAR(32) COMMENT'部门',project_id VARCHAR(16) COMMENT'项目ID',project_name VARCHAR(64) COMMENT'项目名称',revenue DECIMAL(12,2) COMMENT'收入',costDECIMAL(12,2) COMMENT'成本',gross_profit DECIMAL(12,2) COMMENT'毛利',gross_profit_rate DECIMAL(5,2) COMMENT'毛利率(%)',cash_flow DECIMAL(12,2) COMMENT'现金流',update_time DATETIME COMMENT'数据更新时间',PRIMARY KEY (finance_id)) COMMENT'财务主题宽表'DISTRIBUTEDBYHASH(finance_id) BUCKETS 32PROPERTIES ("replication_num" = "1");
3)SQL示例
INSERT INTO dws_finance_detailSELECTf.finance_id,f.department,f.project_id,p.project_name,SUM(o.amount) AS revenue,SUM(c.cost) AScost,SUM(o.amount) - SUM(c.cost) AS gross_profit,ROUND((SUM(o.amount) - SUM(c.cost)) * 100.0 / NULLIF(SUM(o.amount), 0), 2) AS gross_profit_rate,SUM(o.amount) - SUM(c.cost) AS cash_flow,NOW() AS update_timeFROM dwd_order_fact oJOIN dwd_project_info p ON o.project_id = p.project_idJOIN dwd_cost_info c ON o.project_id = c.project_idWHERE o.order_date >= DATE_SUB(CURRENT_DATE, 30)GROUPBY f.finance_id, f.department, f.project_id, p.project_name;
场景:物流时效分析、运输成本优化、异常订单监控。
1)核心字段设计
|
|
|
|
|
|---|---|---|---|
logistics_id |
|
|
|
logistics_company |
|
|
|
transport_type |
|
|
|
region |
|
|
|
delivery_time |
|
|
|
delivery_cost |
|
|
|
on_time_rate |
|
|
|
exception_order_count |
|
|
|
update_time |
|
|
|
2)建表SQL
CREATE TABLE dws_logistics_detail (logistics_id VARCHAR(32) COMMENT'物流ID',logistics_company VARCHAR(32) COMMENT'物流公司',transport_type VARCHAR(16) COMMENT'运输方式',region VARCHAR(32) COMMENT'区域',delivery_time INTCOMMENT'配送时长(小时)',delivery_cost DECIMAL(8,2) COMMENT'运输成本',on_time_rate DECIMAL(5,2) COMMENT'准时率(%)',exception_order_count INTCOMMENT'异常订单数',update_time DATETIME COMMENT'数据更新时间',PRIMARY KEY (logistics_id)) COMMENT'物流主题宽表'DISTRIBUTEDBYHASH(logistics_id) BUCKETS 32PROPERTIES ("replication_num" = "1");
3)SQL代码
INSERT INTO dws_logistics_detailSELECTl.logistics_id,l.logistics_company,l.transport_type,l.region,AVG(TIMESTAMPDIFF(HOUR, o.order_time, l.delivery_time)) AS delivery_time,AVG(l.cost) AS delivery_cost,ROUND(SUM(CASEWHEN l.delivery_time <= o.expected_delivery_time THEN1ELSE0END) * 100.0 / NULLIF(COUNT(*), 0), 2) AS on_time_rate,COUNT(CASEWHEN l.delivery_time > o.expected_delivery_time THEN1END) AS exception_order_count,NOW() AS update_timeFROM dwd_order_detail oJOIN dwd_logistics_info l ON o.order_id = l.order_idWHERE o.order_date = CURDATE()GROUPBY l.logistics_id, l.logistics_company, l.transport_type, l.region;
场景:数据异常检测、ETL流程监控、数据一致性校验。
1)核心字段设计
|
|
|
|
|
|---|---|---|---|
table_name |
|
|
|
column_name |
|
|
|
data_type |
|
|
|
null_rate |
|
|
|
duplicate_rate |
|
|
|
data_lag |
|
|
|
consistency_rate |
|
|
|
check_time |
|
|
|
status |
|
|
|
2)建表SQL
CREATE TABLE dws_data_quality (table_name VARCHAR(64) COMMENT'表名',column_name VARCHAR(64) COMMENT'字段名',data_type VARCHAR(16) COMMENT'数据类型',null_rate DECIMAL(5,2) COMMENT'空值率(%)',duplicate_rate DECIMAL(5,2) COMMENT'重复率(%)',data_lag INTCOMMENT'数据延迟(小时)',consistency_rate DECIMAL(5,2) COMMENT'一致性率(%)',check_time DATETIME COMMENT'检查时间',statusVARCHAR(16) COMMENT'状态',PRIMARY KEY (table_name, column_name, check_time)) COMMENT'数据质量监控宽表'DISTRIBUTEDBYHASH(table_name) BUCKETS 32PROPERTIES ("replication_num" = "1");
3)SQL示例
INSERT INTO dws_data_qualitySELECT'dwd_order_detail'AS table_name,'order_id'AS column_name,'VARCHAR'AS data_type,ROUND(COUNT(*) * 100.0 / NULLIF((SELECTCOUNT(*) FROM dwd_order_detail), 0), 2) AS null_rate,ROUND(COUNT(*) * 100.0 / NULLIF((SELECTCOUNT(*) FROM dwd_order_detail), 0), 2) AS duplicate_rate,TIMESTAMPDIFF(HOUR, MAX(order_time), NOW()) AS data_lag,ROUND(COUNT(*) * 100.0 / NULLIF((SELECTCOUNT(*) FROM dwd_order_detail), 0), 2) AS consistency_rate,NOW() AS check_time,CASEWHENROUND(COUNT(*) * 100.0 / NULLIF((SELECTCOUNT(*) FROM dwd_order_detail), 0), 2) < 5ANDTIMESTAMPDIFF(HOUR, MAX(order_time), NOW()) < 4THEN'正常'ELSE'异常'ENDASstatusFROM dwd_order_detailWHERE order_time < DATE_SUB(NOW(), INTERVAL1HOUR);
三、DWS宽表建模的黄金法则
主题驱动:宽表必须围绕明确的业务主题(如商品、用户)设计,避免"万能宽表"。
轻度聚合:聚合粒度不宜过细(如按天),也不宜过粗(如按月),需根据业务需求调整。
维度统一:确保宽表中的维度与DWD层一致,避免口径不一致导致的分析混乱。
性能优先:通过预计算和索引设计(如ClickHouse的ReplacingMergeTree引擎)提升查询效率。
公共性原则:宽表需服务于多个上层应用(如报表、BI工具),避免重复开发。
作者介绍
范老师,资深数仓专家,10年数据老兵。

