In this webinar we will cover the ins and outs of the migration process with Iceberg as the target, and we will demonstrate open source tooling that will help smooth the transition. Jason Reid, Head of Product at Tabular who led the original migration from Hive to Iceberg at Netflix, will cover:
- Why migrate? the advantage of leaving Hive for a modern format
- Common migration challenges and considerations
- A sample migration project plan that helps ensure you follow best practices
As a “demo” bonus, Jason Reid, Tabular co-founder and head of product, will walk-through three useful open source tools that can streamline the migration effort.
Covers SparkSessionCatalog, IcebergSparkSessionExtensions, spark_catalog.system.snapshot, spark_catalog.system.migrate, spark_catalog.system.add_files, tabular.system.register_table
...
https://www.youtube.com/watch?v=JVhzdPbbNXk
In this video, we demonstrate the new features of PyIceberg 0.2.1. For the demo, we use the docker-spark-iceberg setup that's available here: https://github.com/tabular-io/docker-spark-iceberg
After spinning up the docker-compose setup, the Jupyter notebook will be available at http://localhost:8888/
The notebook PyIceberg - Getting Started.ipynb will guide you through how to read data into PyArrow, and then Pandas. And in the last part, it will demonstrate how to query the Pandas dataset using DuckDB.
For a complete overview of all the installation options, please refer to the documentation: https://py.iceberg.apache.org/
If there are any questions, please reach out using the Iceberg Slack: https://iceberg.apache.org/community/ or open an issue or pull request on Github https://github.com/apache/iceberg
#iceberg #python #pyarrow #duckdb #tabular #datalake
This 5 minuter video describes how to make your Iceberg implementation speedy, efficient and cost-effective.
...
https://www.youtube.com/watch?v=EJETzJCQ5os
00:00 Intro
00:24 Brian Olsen
05:29 Alex Merced
06:40 Sam Redai
10:40 Ryan Blue
Series: Ask the Iceberg Experts
Guests:
- Brian Olsen, Developer Advocate Trino/Starburst, Iceberg contributor
- Alex Merced, Developer Advocate Dremio
- Sam Redai, Software Engineer Netflix, Iceberg contributor
- Ryan Blue, Tabular CEO, co-creator of Iceberg
Subject: We talk with Iceberg experts around the industry for their thoughts on the highlights of Iceberg evolution and adoption in 2022
iceberg.apache.org
www.dremio.com
www.starburst.io
www.trino.io
www.netflix.com
www.tabular.com
#iceberg #datalake #datalakehouse
...
https://www.youtube.com/watch?v=wAKm5gqEGE0
Jason Reid, head of product at Tabular and former data engineering director at Netflix, covers how Tabular implements best practices for file, stream and CDC ingestion.
...
https://www.youtube.com/watch?v=JWRA6fFSTjs
See a demonstration of connecting Amazon S3 storage to Tabular to create Apache Iceberg tables for querying by Amazon Athena, Spark (EMR), Snowflake, Trino and other query engines.
...
https://www.youtube.com/watch?v=bhozVwfOpfg
Series: Tabular Solutions
Guest: Albert Wong, Developer Advocate, Dremio
Subject: Accessing Tabular managed Iceberg tables from CelerData
Albert shows Shawn how to use CelerData to query and create data in Tabular managed Iceberg tables. CelerData is a managed solution for StarRocks, an open-source MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc query.
www.celerdata.com
www.tabular.io
#iceberg #datalake #apacheiceberg #datalakehouse #starrocks #celerdata #tabular
...
https://www.youtube.com/watch?v=bAmcTrX7hCI
Jason Reid, head of product at Tabular, discusses how to use incremental processing to take advantage of Iceberg ACID guarantees and reduce compute costs.
...
https://www.youtube.com/watch?v=H8E5kNGTgEk