点击蓝字 关注我们
摘要
Apache DolphinScheduler官方的升级文档提供了升级脚本,如果只是跨小版本的更新,那么只用执行脚本就好了,但跨多个大版本升级时依然容易出现各种问题,特此总结。
升级完成后使用资源中心报错 IllegalArgumentException: Failed to specify server's Kerberos principal name
升级完成后查看任务实例的日志,报错未找到日志
升级完成后创建工作流报错
升级后任务实例列表为空
执行升级脚本的过程中报错空指针
分析日志,定位到 UpgradeDao.java 517行
分析日志,定位到 UpgradeDao.java 675行
接入LDAP后登陆失败,不知道email字段名
管理员给普通用户授权资源文件不生效
-
kerberos过期的问题
1、升级完成后使用资源中心报错 IllegalArgumentException: Failed to specify server's Kerberos principal name
DS
解决方法:
<property><name>dfs.namenode.kerberos.principal.patternname><value>*value>property>
2、升级完成后查看任务实例的日志,报错未找到日志
解决方法:
update t_ds_task_instance set log_path=replace(log_path,'/logs/','/worker-server/logs/');
cp -r {旧版本dolphinscheduler目录}/logs/[1-9]* {新版本dolphinscheduler目录}/worker-server/logs/*
3、升级完成后创建工作流报错
解决方法:
# 查出主键自增值select AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA = 'dolphinscheduler' AND TABLE_NAME = 't_ds_process_definition' limit 1# 将上面sql的执行结果填写到下方参数处执行alter table dolphinscheduler_bak1.t_ds_process_definition_log auto_increment = {max_id};
4、升级后任务实例列表为空
select<include refid="baseSqlV2"><property name="alias" value="instance"/>include>,process.name as process_instance_namefrom t_ds_task_instance instanceleft join t_ds_task_definition_log define on define.code=instance.task_code and define.version=instance.task_definition_versionleft join t_ds_process_instance process on process.id=instance.process_instance_idwhere define.project_code = #{projectCode}<if test="startTime != null">and instance.start_time =]]> #{startTime}if>......省略多余部分
解决方法:
select<include refid="baseSqlV2"><property name="alias" value="instance"/>include>,process.name as process_instance_namefrom t_ds_task_instance instance-- left join t_ds_task_definition_log define-- on define.code=instance.task_code and-- define.version=instance.task_definition_versionjoin t_ds_process_instance processon process.id=instance.process_instance_idjoin t_ds_process_definition defineon define.code=process.process_definition_codewhere define.project_code = #{projectCode}<if test="startTime != null">and instance.start_time =]]> #{startTime}if>......省略多余部分
5、执行升级脚本的过程中报错空指针
5.1
分析日志,定位到 UpgradeDao.java 517行
513 if (TASK_TYPE_SUB_PROCESS.equals(taskType)) {514 JsonNode jsonNodeDefinitionId = param.get("processDefinitionId");515 if (jsonNodeDefinitionId != null) {516 param.put("processDefinitionCode",517 processDefinitionMap.get(jsonNodeDefinitionId.asInt()).getCode());518 param.remove("processDefinitionId");519 }520 }
解决方法:
if (jsonNodeDefinitionId != null) {if (processDefinitionMap.get(jsonNodeDefinitionId.asInt()) != null) {param.put("processDefinitionCode",processDefinitionMap.get(jsonNodeDefinitionId.asInt()).getCode());param.remove("processDefinitionId");} else {logger.error("*******************error");logger.error("*******************param:" + param);logger.error("*******************jsonNodeDefinitionId:" + jsonNodeDefinitionId);}}
5.2
分析日志,定位到 UpgradeDao.java 675行
669 if (mapEntry.isPresent()) {670 Map.Entry<long, map> processCodeTaskNameCodeEntry = mapEntry.get();671 dependItem.put("definitionCode", processCodeTaskNameCodeEntry.getKey());672 String depTasks = dependItem.get("depTasks").asText();673 long taskCode =674 "ALL".equals(depTasks) || processCodeTaskNameCodeEntry.getValue() == null ? 0L675 : processCodeTaskNameCodeEntry.getValue().get(depTasks);676 dependItem.put("depTaskCode", taskCode);677 }
解决方法:
long taskCode =0;if (processCodeTaskNameCodeEntry.getValue() != null&&processCodeTaskNameCodeEntry.getValue().get(depTasks)!=null){taskCode =processCodeTaskNameCodeEntry.getValue().get(depTasks);}else{logger.error("******************** depTasks:"+depTasks);logger.error("******************** taskCode not in "+JSONUtils.toJsonString(processCodeTaskNameCodeEntry));}dependItem.put("depTaskCode", taskCode);
6、接入LDAP后登陆失败,不知道email字段名
security:authentication:# Authentication types (supported types: PASSWORD,LDAP)type: LDAP# IF you set type `LDAP`, below config will be effectiveldap:# ldap server configurls: xxx: xxxusername: xxxpassword: xxxuser:# admin userId when you use LDAP loginadmin: xxx: xxx: xxx# action when ldap user is not exist (supported types: CREATE,DENY): CREATE
解决办法:
ctx = new InitialLdapContext(searchEnv, null);SearchControls sc = new SearchControls();sc.setReturningAttributes(new String[]{ldapEmailAttribute});sc.setSearchScope(SearchControls.SUBTREE_SCOPE);EqualsFilter filter = new EqualsFilter(ldapUserIdentifyingAttribute, userId);NamingEnumeration results = ctx.search(ldapBaseDn, filter.toString(), sc);if (results.hasMore()) {// get the users DN (distinguishedName) from the resultSearchResult result = results.next();NamingEnumeration attrs = result.getAttributes().getAll();while (attrs.hasMore()) {// Open another connection to the LDAP server with the found DN and the passwordsearchEnv.put(Context.SECURITY_PRINCIPAL, result.getNameInNamespace());searchEnv.put(Context.SECURITY_CREDENTIALS, userPwd);try {new InitialDirContext(searchEnv);} catch (Exception e) {logger.warn("invalid ldap credentials or ldap search error", e);return null;}Attribute attr = attrs.next();if (attr.getID().equals(ldapEmailAttribute)) {return (String) attr.get();}}}
// sc.setReturningAttributes(new String[]{ldapEmailAttribute});
NamingEnumeration attrs = result.getAttributes().getAll();
7、管理员给普通用户授权资源文件不生效
解决办法:
@Overridepublic Set listAuthorizedResource(int userId, Logger logger) {List relationResources;if (userId == 0) {relationResources = new ArrayList<>();} else {// query resource relationList resIds = resourceUserMapper.queryResourcesIdListByUserIdAndPerm(userId, 0);relationResources = CollectionUtils.isEmpty(resIds) ? new ArrayList<>() : resourceMapper.queryResourceListById(resIds);}List ownResourceList = resourceMapper.queryResourceListAuthored(userId, -1);relationResources.addAll(ownResourceList);return relationResources.stream().map(Resource::getId).collect(toSet()); // 解决资源文件授权无效的问题// return ownResourceList.stream().map(Resource::getId).collect(toSet());}
8、kerberos过期的问题
解决办法:
/*** * 定时更新凭证*/private static void startCheckKeytabTgtAndReloginJob() {// 每天循环,定时更新凭证Executors.newScheduledThreadPool(1).scheduleWithFixedDelay(() -> {try {UserGroupInformation.getLoginUser().checkTGTAndReloginFromKeytab();logger.warn("Check Kerberos Tgt And Relogin From Keytab Finish.");} catch (IOException e) {logger.error("Check Kerberos Tgt And Relogin From Keytab Error", e);}}, 0, 1, TimeUnit.DAYS);logger.info("Start Check Keytab TGT And Relogin Job Success.");}
public static boolean loadKerberosConf(String javaSecurityKrb5Conf, String loginUserKeytabUsername,String loginUserKeytabPath, Configuration configuration) throws IOException {if (CommonUtils.getKerberosStartupState()) {System.setProperty(Constants.JAVA_SECURITY_KRB5_CONF, StringUtils.defaultIfBlank(javaSecurityKrb5Conf,PropertyUtils.getString(Constants.JAVA_SECURITY_KRB5_CONF_PATH)));configuration.set(Constants.HADOOP_SECURITY_AUTHENTICATION, Constants.KERBEROS);UserGroupInformation.setConfiguration(configuration);UserGroupInformation.loginUserFromKeytab(StringUtils.defaultIfBlank(loginUserKeytabUsername,PropertyUtils.getString(Constants.LOGIN_USER_KEY_TAB_USERNAME)),StringUtils.defaultIfBlank(loginUserKeytabPath,PropertyUtils.getString(Constants.LOGIN_USER_KEY_TAB_PATH)));startCheckKeytabTgtAndReloginJob(); // 此处调用return true;}return false;}
参与贡献
随着国内开源的迅猛崛起,Apache DolphinScheduler 社区迎来蓬勃发展,为了做更好用、易用的调度,真诚欢迎热爱开源的伙伴加入到开源社区中来,为中国开源崛起献上一份自己的力量,让本土开源走向全球。
参与 DolphinScheduler 社区有非常多的参与贡献的方式,包括:
贡献第一个PR(文档、代码) 我们也希望是简单的,第一个PR用于熟悉提交的流程和社区协作以及感受社区的友好度。
社区汇总了以下适合新手的问题列表:https://github.com/apache/dolphinscheduler/issues/5689
非新手问题列表:https://github.com/apache/dolphinscheduler/issues?q=is%3Aopen+is%3Aissue+label%3A%22volunteer+wanted%22
如何参与贡献链接:https://dolphinscheduler.apache.org/zh-cn/community/development/contribute.html
来吧,DolphinScheduler开源社区需要您的参与,为中国开源崛起添砖加瓦吧,哪怕只是小小的一块瓦,汇聚起来的力量也是巨大的。
参与开源可以近距离与各路高手切磋,迅速提升自己的技能,如果您想参与贡献,我们有个贡献者种子孵化群,可以添加社区小助手微信(Leonard-ds) ,手把手教会您( 贡献者不分水平高低,有问必答,关键是有一颗愿意贡献的心 )。

添加社区小助手微信(Leonard-ds)
添加小助手微信时请说明想参与贡献。
来吧,开源社区非常期待您的参与。
☞Apache DolphinScheduler 社区成功举办“小型”网友见面会,共同探讨大数据工作流引擎的发展与创新

