LBRY Block Explorer

LBRY Claims • integrate-schema-registry-kafka-in

6bd8fa93f8f003ee12a939da8dfa3a89e3c1d723

Published By
Created On
11 Sep 2022 09:32:49 UTC
Transaction ID
Cost
Safe for Work
Free
Yes
Integrate Schema Registry & Kafka in Python to Build Streaming Processing|蘇揮原 Mars Su
PyCon APAC 2022|一般演講 Talks|國泰金控 Cathay Financial Holdings / 美光科技 Micron 冠名贊助

✏️ 共筆 Note:https://hackmd.io/@pycontw/BkwCLT7Ji
?? Slido:https://app.sli.do/event/sE3aaBEUsYZ7HiLkrvcrAp
? 投影片 Slides:https://drive.google.com/file/d/1NdBaMqZjVTucV_Z2EY2iFub826hF92ga/view?usp=sharing
? 語言 Language:中文演講/英文投影片 Chinese talk w. English slides
? 層級 Level:中階 Intermediate
? 分類 Category:應用 Application

? 摘要 Abstract ?
In the current data-driven world, we are often faced with how to process and analyze data effectively and in real time. And streaming processing will be an important application. In addition, the data will have different schemas for different applications and needs. In order to effectively achieve data correctness and availability in the application of streaming, it is necessary to integrate schema verification into the streaming process. In order to achieve this objective, I will start with introducing the concept and use cases or scenarios of streaming process and two services, Apache Kafka and Schema Registry. The Kafka is a message queue system that can handle a large amount of streaming data. And Schema Registry is a service which based on Kafka, it can help us do schema verification during producing data to Kafka or consuming data from Kafka. Lastly, I will share how to use python to integrate these two service to implement a reliable streaming process.

? 說明 Description ?
Abstract
In this session, i will start with sharing the difference between batch and streaming to help participants establish a basic concept, also introduce importance and use cases or scenarios about streaming process. And i will highlight Apache Kafka and Schema Registry architecture and purpose. Then, i will discuss how to use python to implement streaming process, include produce data, consume data and achieve data schema verification through example code and demo. Lastly, i also share some important settings and how to finetune producer and consumer to improve high throughput and latency on streaming process.

Session Outline
1. Batch vs. Streaming
- Introducing the concept of difference between batch and streaming.
2. Why need streaming process?
- Introducing importance and use cases about streaming processing.
3. Talk about Apache Kafka
- Introducing Apache Kafka, include purpose, architecture and components.
4. Talk about Schema Registry
- Introducing Schema Registry, include purpose, schema evolution strategy and schema verification.
5. Python lib - confluent kafka client
- Introducing the key python lib to help integrate Apache Kafka and Schema Registry.
6. How to produce message?
- Demo example the producer code with python.
7. How to consume message?
- Demo example the consumer code with python.
8. Finetune producer and consumer.
- Introducing important settings and how to achieve high throughput on producer and consumer side.
9. Conclusion

Reference
1. Apache Kafka Introduction: https://kafka.apache.org/intro
2. Schema Registry Introduction on Confluent: https://docs.confluent.io/platform/current/schema-registry/index.html
3. Schema Registry Introduction on Confluent youtube: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjg0p-B7sX2AhVkJaYKHcPwDo4QwqsBegQIBRAB&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D_x9RacHDQY0&usg=AOvVaw3fDpPkciE2fMTQijPh2Mkx
4. Confluent's Kafka Python Client - GitHub: https://github.com/confluentinc/confluent-kafka-python
5. Streaming Data: How it Works, Benefits, and Use Cases: https://www.confluent.io/learn/data-streaming/
6. 6 Most Common Streaming Data Use Cases: https://www.upsolver.com/blog/6-most-common-streaming-data-use-cases

? 講者介紹 About Speaker - 蘇揮原 Mars Su ?
A Senior ML/Data Engineer in Gogolook. Currently i am in charge of implementing streaming etl infrastructure and nlp related ml model and application. Having 4+ years experience of data science and data engineering, include NLP and Streaming(micro-batch) ETL design. My research interests include nlp related algorithm model and paper, streaming data pipeline and cloud service. Hope i can contribute something in data world.

#pycontw #pyconapac2022 #python #kafka #streaming

Follow “PyCon Taiwan”
⭐️ Official Website: https://tw.pycon.org
⭐️ Facebook: https://www.facebook.com/pycontw
⭐️ Instagram: https://www.instagram.com/pycontw
⭐️ Twitter: https://twitter.com/PyConTW
⭐️ LinkedIn: https://www.linkedin.com/company/pycontw
⭐️ Blogger: https://pycontw.blogspot.com
...
https://www.youtube.com/watch?v=X0HryRZ7BnQ
Author
Content Type
Unspecified
video/mp4
Language
English
Open in LBRY

More from the publisher

Controlling
VIDEO
#SHOR
Controlling
VIDEO
LIGHT
Controlling
VIDEO
BIG M
Controlling
VIDEO
Controlling
VIDEO
KEYNO
Controlling
VIDEO
Controlling
VIDEO
DOUJI
Controlling
VIDEO
[PYCO
Controlling
VIDEO
SOSOR