PyCon APAC 2022|一般演講 Talks|國泰金控 Cathay Financial Holdings / 美光科技 Micron 冠名贊助
✏️ 共筆 Note:https://hackmd.io/@pycontw/H1NcUpQki
?? Slido:https://app.sli.do/event/eTKYkMG74iDjguzjGL2ELM
? 投影片 Slides:https://www.slideshare.net/AndrewLi55/distributing-your-pandas-etl-job-using-modin-and-raypdf
? 語言 Language:中文演講/英文投影片 Chinese talk w. English slides
? 層級 Level:中階 Intermediate
? 分類 Category:資料分析 Data Analysis
? 摘要 Abstract ?
Are you using pandas to process data? Do you want to handle a large dataset using pandas? Do you want to develop the Python code on your laptop and run it on Cloud or Kubernetes effortlessly? In this talk, I assume you are familiar with pandas and I will share how to distribute your pandas ETL job by changing few lines of code(even just one).
? 說明 Description ?
If you are working in Data Science field, pandas is a fantastic tool for Python users. According to the offical document, pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. However, every tools have limitations. pandas can manipulate small data efficiently because it handle the data in memory, which means it's difficult to process large datasets.
In this talk, I will share two common cases describing how to distribute your pandas ETL job by changing few lines of code(even just one):
- Handling many small datasets which share the same ETL logic
- Handling an out-of-core dataset without re-write the ETL script
References:
- Ray: https://docs.ray.io/en/latest/
- Modin: https://modin.readthedocs.io/en/latest/
? 講者介紹 About Speaker - 李泓旻 (Andrew) ?
I am currently working as a data engineer in the financial industry. In the past, I worked as a one-stop shop for data science(Manufacturing), covering data engineering, ETL, modeling, and deployment. Dedicated to finding the most suitable tool for each need. Keep contributing to open source projects. LIFE IS SHORT. USE PYTHON.
#pycontw #pyconapac2022 #python #dataanalysis #etl #pandas #ray #modin
Follow “PyCon Taiwan”
⭐️ Official Website: https://tw.pycon.org
⭐️ Facebook: https://www.facebook.com/pycontw
⭐️ Instagram: https://www.instagram.com/pycontw
⭐️ Twitter: https://twitter.com/PyConTW
⭐️ LinkedIn: https://www.linkedin.com/company/pycontw
⭐️ Blogger: https://pycontw.blogspot.com
...
https://www.youtube.com/watch?v=yEviapotudA
摘要 Abstract:
To simplify the work to do, CPython leverages a global big lock to serialize execution of threads. The big lock results in wasting computing power for lock contention in truly parallel environment and will become the bottleneck when the system getting larger.
In the previous work "Global Interpreter Lock: Episode I - Break the Seal" in PyCon APAC 2015[1], we focus on how to live along with CPython's GIL well. In this work, we are going to nullify the effects of GIL by giving each thread a private GIL.
Without modifications of the OS kernel and CPython, dynamic linker would be the easiest way to separate memory namespaces within a process. We will show examples on how to use it and discuss the limitations.
Slide Link:
https://www.slideshare.net/penvirus/global-interpreter-lock-episode-iii-cat-lt-devzero-gil
PyCon Taiwan 2017 official: https://tw.pycon.org/2017/
PyCon Taiwan 2017 Facebook Fan Page: https://www.facebook.com/pycontw/
...
https://www.youtube.com/watch?v=gmCAc1e_ANo
Day 2, 10:35-11:20
Abstract
Jupyter has become a critical component of the machine learning life cycle. However scaled enterprise deployments and making the Data Science Experience frictionless remain challenging. We address a few common issues with PrimeHub, an open-source enterprise offering based on JupyterHub, with the capability to work with popular ML tooling like mlflow, labelstudio, and streamlit. This talk also investigates the MLOps trends adjacent to the Jupyter ecosystem.
Description
This talk is intended for audience interested in larger scale Jupyter environment deployment in their organisation, particularly for machine learning applications.
PrimeHub is an open-source enterprise offering based on JupyterHub, which also allows orchestrating tools like mlflow, labelstudio, and streamlit, to assemble your own end-to-end ML toolbox for your team. We address a few common hurdles:
advanced resource scheduling for heterogenous GPU clusters
multi-tenancy and project isolations
customizable and consistent ML environments per project
managing jupyterlab extensions for teams
hybrid development environment such as vscode and ssh within notebook instances
authorization and data access management
usage reporting: allocation and utilization
We also investigate a few trends adjacent to the day-to-day jupyter environment used by data scientists and data engineers, where the roles become more cross functional in the age of MLOps:
Working with job schedulers from within Notebook
Managing ML model deployments, CI/CD integration
Model monitoring and retraining
Slides not uploaded by the speaker.
HackMD: https://hackmd.io/@pycontw/2021/%2F%40pycontw%2Frk-phz9zF
Speaker: Chia-liang Kao
clkao (Chia-liang Kao) has been an open source software developer since 2000. He believes that good collaboration model and tools drive innovation. In 2013, he created SVK, a distributed version control system that helps developers collaborate. He co-founded the g0v.tw community in 2012, advocating information transparency and digital-activism through open source model. g0v.tw was awarded as "Digital Communities: Award of Distinction" by Prix Ars Electronica 2018. He started InfuseAI in 2018 to enable data scientists to thrive, and to help wider adoption of AI across industries.
...
https://www.youtube.com/watch?v=qf0U1JIgCFY
PyCon APAC 2022|一般演講 Talks|國泰金控 Cathay Financial Holdings / 美光科技 Micron 冠名贊助
✏️ 共筆 Note:https://hackmd.io/@pycontw/r1VGDT71j
?? Slido:https://app.sli.do/event/kW33cdquuaLPCbPhrXU6Zn
? 投影片 Slides:https://bit.ly/3o6ufGY
? 語言 Language:中文演講/英文投影片 Chinese talk w. English slides
? 層級 Level:中階 Intermediate
? 分類 Category:教育 Education
? 摘要 Abstract ?
內容主要將透過PAIA機器學習系統,了解學生如何透過該系統學習AI與python,並依據 PAIA 系統的特性,揭露我們如何使用 Kubernetes API 配合 Python,重新架構與設計該系統拆分為各項微服務,來達到方便 PAIA 系統部署、維運、與快速更新的目標。此過程主要闡述解決四大問題:
1. Python 與 Kubernetes 配合使用,建立以 Infrastructure as code 的系統架構。
2. 微服務化後的系統可更新單一服務,不影響其他元件運作。
3. 透過 k8s configmap 參數化系統設定,方便快速多 (prod, stage, dev) 環境部署。
4. AI 模擬、訓練、與競賽的過程,都containerize化,並透過 k8s 特性集中 logs,易於查找問題與隔離問題。
? 說明 Description ?
我們 PAIA 系統使用的是 Python Django框架,主要處理 API 的創建,包含 authentication, authorization,accounting 並連接 postgreSQL 資料庫。在我們 PAIA 的客戶端使用場景為,提供使用者一個透過 AI 學習撰寫程式碼,來與其他AI程式碼競賽的執行遊戲環境。使用者可以選擇除了Python 以外,還有 block-based 的 blockly (類似 scrach) 圖形化語言方式來撰寫 AI 程式碼,並在完成 AI model 的選用與 data training 之後上傳至我們的 PAIA 系統,進行以下執行3階段步驟的過程:
Frontend (Vue/NuxtJS) ==== Backend (Django) ==== MLGame (Python/Pygame)
除了 frontend 部份不是使用 Python 以外,剩下兩個部份 backend 與 MLGame 皆使用 Python 與其相關技術。將會在此次演講闡述以下項目的實作與架構過程:
1. Backend: API design, serializer and its folder structure in Django, database creation and migration, redis and cache mechanism, pika and websocket, and AAA (authentication, authorization, accounting) design.
2. MLGame: rabbitmq consumer(pika) with websocket, kubernetes client API in python, python multiple inheritance and design pattern in AI fields
主要大綱為:
‧ What is PAIA?
‧ Why we build up AI learning system?
‧ Why Python?
‧ How we use python to design the architecture and its rich library to build up microservices
‧ How we do devops and maintenance on production
‧ How we apply the PAIA system into physical education environment
‧ What is the next approach of AI system PAIA
‧ Conclusion
可參考以下額外資料:
- PAIA 系統網站:https://www.paia-arena.com/
- PAIA 系統介紹影片:https://youtu.be/N68eL2ZDeNc
- 使用技術分享:https://bit.ly/3IMiyx1
? 講者介紹 About Speaker - 邱文淇 (Ivan Chiou) ?
曾經在各大公司先後擔任過介面工程師、系統工程師、系統分析師、專案經理、產品經理、Scrum Master、前端主管等歷練,目前為 PAIA 團隊資深技術經理。專注於程式教育、遠端協同、敏捷思維、團隊領導與網站技術的研究、創新、與導入。對於多媒體融合、跨領域職能協同合作有相當豐富的經驗,自許為團隊的心靈導師 (Mentor),為年輕的工程師注入希望並指引方向。
#pycontw #pyconapac2022 #python #paia #django #mlgame #pygame #pika
Follow “PyCon Taiwan”
⭐️ Official Website: https://tw.pycon.org
⭐️ Facebook: https://www.facebook.com/pycontw
⭐️ Instagram: https://www.instagram.com/pycontw
⭐️ Twitter: https://twitter.com/PyConTW
⭐️ LinkedIn: https://www.linkedin.com/company/pycontw
⭐️ Blogger: https://pycontw.blogspot.com
...
https://www.youtube.com/watch?v=lNBDTDyyJ8E
PyCon APAC 2022|一般演講 Talks|國泰金控 Cathay Financial Holdings / 美光科技 Micron 冠名贊助
✏️ 共筆 Note:https://hackmd.io/@pycontw/HkTp86XJs
?? Slido:https://app.sli.do/event/peUkfwVcPamsp56SBtkm8W
? 語言 Language:英文 English
? 層級 Level:中階 Intermediate
? 分類 Category:其他 Other
? 摘要 Abstract ?
By using Streamlit and streamlit-webrtc, we can create web-based real-time computer vision apps only with ~10 or 20 additional lines of Python code. It's a new easy way to turn computer vision models into web-based demos and has some advantages over the conventional ways which use OpenCV. In this talk, I will demonstrate the development process using these libraries and show a variety of examples so that we see how easy and useful they are and can make use of them in daily development and research.
? 說明 Description ?
Streamlit is a fast-growing framework with which developers can build web-based data apps only with Python code.
streamlit-webrtc is a custom component for Streamlit which allows to use real-time video/audio streams on Streamlit apps.
With a combination of these libraries, we can create web-based real-time video apps quickly.
The speaker is the author of streamlit-webrtc, and has been selected as a member of the Streamlit Creators program for the contributions including it.
A lightning talk about these technologies at PyCon JP 2021 is available: https://youtu.be/_LuLs8H1gJc
About development using these libraries, following articles have been published:
- Developing Web-Based Real-Time Video/Audio Processing Apps Quickly with Streamlit
- Real-Time Video Streams With Streamlit-WebRTC
As linked from the streamlit-webrtc repo, demo apps I have developed are available online:
- Demo showcase including real-time object detection: https://share.streamlit.io/whitphx/streamlit-webrtc-example/main/app.py
- Source code: https://github.com/whitphx/streamlit-webrtc-example/blob/main/app.py
- Real-time Speech-to-Text: https://share.streamlit.io/whitphx/streamlit-stt-app/main/app_deepspeech.py
- Source code: https://github.com/whitphx/streamlit-stt-app
- Real-time style transfer: https://share.streamlit.io/whitphx/style-transfer-web-app/main/app.py
- Source code: https://share.streamlit.io/whitphx/style-transfer-web-app/main/app.py
- Real-time Tokyo 2020 Pictogram: https://share.streamlit.io/whitphx/tokyo2020-pictogram-using-mediapipe/streamlit-app
- Source code: https://github.com/whitphx/Tokyo2020-Pictogram-using-MediaPipe
- Video chat: online demo is not available because it does not have an auth mechanism and is only for private use.
- Source code: https://github.com/whitphx/streamlit-video-chat-example
? 關於講者 About Speaker - Yuichiro Tachibana ?
Yuichiro works as a professional software developer and also loves contributing to OSS projects. As a Pythonista, he has participated in various projects including web development, multimedia streaming, data management, computer vision, and machine learning.
#pycontw #pyconapac2022 #python #opencv #streamlit #webrtc #computervision
Follow “PyCon Taiwan”
⭐️ Official Website: https://tw.pycon.org
⭐️ Facebook: https://www.facebook.com/pycontw
⭐️ Instagram: https://www.instagram.com/pycontw
⭐️ Twitter: https://twitter.com/PyConTW
⭐️ LinkedIn: https://www.linkedin.com/company/pycontw
⭐️ Blogger: https://pycontw.blogspot.com
...
https://www.youtube.com/watch?v=ruSN7t2eor4