
Iceberg 更新
Apache Iceberg 1.3.0 已发布
Spark 现在支持 Iceberg UUID 类型作为字符串,以确保与 Trino 表的兼容性。其他值得注意的最新情况:
已支持 Flink 1.17
Spark version 3.4
添加了对 Spark TIMESTAMP_NTZ类型的支持,使用了 Iceberg 的 TIMESTAMP WITH TIMEZONE #7553
从 MERGE 基数检查中删除了排序 (Thanks, Anton!) #7558
重写位置删除文件的 Procedure (Thanks, Szehon!)
修复了 FileIO 关闭问题 (Thanks, Eduard!)
已支持 JDK 17
已移除对以下引擎版本的支持:
Flink 1.14
Spark 2.4
PyIceberg 更新
PyIceberg 迎来激动人心的时刻。我们即将完成PyIceberg 0.4.0版本,它带来了以下内容:
支持将Parquet Schema转换为Iceberg Schema
支持使用FSSpec读取数据。
支持获取有限数量的行以快速查看数据集。
升级 PyArrow >= 12.0.0以改进了性能。
通过元数据过滤条件下推,提高查询性能。
能够执行SQL样式的筛选器,row_filter='assengers>=3'。
支持REST Catalog的SigV4 。
对文档网站进行了全面改造。
许多错误已经修复!
更多信息可以在官方网站上找到,此包已上传到PyPI。
Iceberg 行业应用
IBM 支持Iceberg作为WatsonX人工智能计划的一部分
https://www.ibm.com/blog/ibm-to-help-businesses-scale-ai-workloads-for-all-data-anywhere/
Starburst 推出了一个新的iceberg页面
https://www.starburst.io/info/apache-iceberg/
Starburst 增加了对Tabular的支持
https://docs.starburst.io/starburst-galaxy/catalogs/tabular.html
社区博客:
Tabular - Securing the Data Lake - Part II
https://tabular.io/blog/securing-the-data-lake-part-2/
Starburst - 如何将 Hive 表迁移到 Apache Iceberg
https://www.starburst.io/blog/how-to-migrate-your-hive-tables-to-apache-iceberg/
Starburst - 教程: Starburst Galaxy 的物化视图
https://www.starburst.io/blog/tutorial-using-starburst-galaxys-materialized-views-with-apache-iceberg/
Tabular - 教程:使用 Trino 和 Iceberg 构建数据仓库
https://tabular.io/tutorials/using-trino-and-iceberg/
Anuj Syal - 2023年需要学习的五大新数据工程技术
https://medium.datadriveninvestor.com/top-5-new-data-engineering-technologies-to-learn-in-2023-2985af82718
Marin Aglić - 学习 Apache Iceberg-将 Catalog 存储到 Postgres
https://betterprogramming.pub/learning-apache-iceberg-storing-the-catalog-to-postgres-c54ef5e7c628
Amazon - 提高基于Amazon S3数据湖构建的Apache Iceberg表的运营效率
https://aws.amazon.com/cn/blogs/big-data/improve-operational-efficiencies-of-apache-iceberg-tables-built-on-amazon-s3-data-lakes
Iceberg 相关报道
Oracle: Oracle Autonomous Data Warehouse Breaks Through the Limitations of Data Management
https://www.oracle.com/news/announcement/new-autonomous-data-warehouse-innovations-2023-05-03/
The Register: Trino and dbt open source data tools snuggle closer with integrated SaaS https://www.theregister.com/2023/04/28/starburst_dbt_saas/
CXOtoday: Cloudera Recognized as a Leader in 2023 GigaOm Radar for Data Lakes and Lakehouses
https://www.cxotoday.com/press-release/cloudera-recognized-as-a-leader-in-2023-gigaom-radar-for-data-lakes-and-lakehouses/
datanami: The Semantic Layer Architecture: Where Business Intelligence is Truly Heading
https://www.datanami.com/2023/05/15/the-semantic-layer-architecture-where-business-intelligence-is-truly-heading/
datanami: HPE Brings Analytics Together on its Data Fabric https://www.datanami.com/2023/05/16/hpe-brings-analytics-together-on-its-data-fabric/

