Write Your Own Micro Data Processing Framework in Python|David Chen|PyCon TW 2016
PyCon Taiwan 2016|一般演講 Talks
? 摘要 Abstract ? Data processing framework is the core element of Big Data. It provided good abstraction for computing resource and logic. In this talk, I will use google mapreduce (written in python) to introduce some key components such as message queue, pipeline, object collection, fault tolerance and task flow which used in common data processing frameworks. Then use a micro framework written in django to demo how data processing works.
? 關於講者 About Speaker - David Chen ? GliaCloud founder and coorganizer of GCPUG. Gliacloud is a startup focus on AI and data analysis. I like coding, ramen. and skiing. Most of time, I involve in python coding in Google Cloud Platform for big data processing and cloud architect designing. It is my pleasure to join pycon with lots of passion community members.
Speaker: Chia-Chi Chang
https://docs.google.com/document/d/1LwQG8pLLO2PEviExoU3xiqyclnaT9kfhwsCGVKVRB2I/edit?usp=sharing
Although there are several data mining tools in python, you can use them to deal with almost every kind of data (numeric, text, image, audio, ...) you met. Besides, there are also lots of modeling tools in python, you can use them to build the FIRST LIGHTING MODEL to solve the problems.
However, if you want to solve problems deeply, most of time, you need to write down the customerized models and solve them by yourself. Instead of using fast modeling tools, you need to know more about the essential things in modeling:
What is a model ?
- How models solve your problems ?
- What is the connection between models and data ?
- What is the important data ? important model ?
I deeply believe ... The more the connection between models and data you know, the deeper the problem you can solve.
Outline:
What is Modeling ?
Data, Model, Evaluators
Direct Problem: Data + Evaluators → Model
Inverse Problem: Data + Models → Evaluator
Hacking Models with Metric Learning
Data as a Model & Model as a Data
Duality between Dimension Reduction and Clustering
...
https://www.youtube.com/watch?v=PznPp-BbwyU
Day 1, R1 15:55–16:10
Productization of machine learning (ML) solutions can be challenging. Therefore, the concept of operationalization on machine learning (MLOps) has emerged in the past few years for effective model lifecycle management. One of the core aspects of MLOps is "monitoring".
ML models are built by experimenting with a wide range of datasets. However, since the real data continues to change, it is necessary to monitor and to manage model usage, consumption, and results of models.
MLflow is an open-source framework designed to manage the end-to-end ML lifecycle with different components. In the talk, the basic concepts of MLflow will be introduced. Then, MLflow Tracking will be the main focus. You will know how to track experiments for recording and comparing parameters and results by MLflow Tracking.
Slides: https://speakerdeck.com/sucitw/track-machine-learning-applications-by-mlflow-tracking
Speaker: Shuhsi Lin
A data engineer and python programmer. Currently working on various data applications in a manufacturing company.
Research interests: IoT applications, data streaming processing, data analysis and data visualization.
...
https://www.youtube.com/watch?v=76QWG9di1Hs
PyCon APAC 2022|一般演講 Talks|國泰金控 Cathay Financial Holdings / 美光科技 Micron 冠名贊助
✏️ 共筆 Note:https://hackmd.io/@pycontw/rJ5aL67yo
?? Slido:https://app.sli.do/event/ek2NTzkytzYkU1XVkpvdzk
? 語言 Language:英文 English
? 層級 Level:中階 Intermediate
? 分類 Category:應用 Application
? 摘要 Abstract ?
Finding a working shift schedule can be a tedious task, there’s usually various requirements to juggle all at once, from time-offs, shift types, conflicting schedules, to personal preferences; and one may easily miss a requirement or two by mistake. Instead, we can use so called “solvers” and leave the hard lifting up to computer to find not just a valid schedule, but a good one; and as it turns out, the same technique also works for a broad spectrum of problems such as conference scheduling, vehicle routing, bin packing, and more.
? 說明 Description ?
This talk will likely interest those who likes to explore using Python to solve logical problems, because in some way, finding a shift schedule is a lot like playing Sudoku. There’s a lot of holes that you need to fill, but only certain element can fit in the holes. So if we can write programs that solves Sudoku, then surely we can write programs that produces a good shift schedule, right?
Indeed we can, but there’s an even better approach. Instead of writing our own custom program that specifically schedules shifts, we can use off-the-shelve solvers that are very good at solving a more general kind problem, and transform our original problem into a form that it understands. Behind the scene these off-the shelve solvers use similar technique used by the Sudoku solver mentioned above (e.g. backtracking and constrain propagation), albeit much more advanced, since tons of optimization and clever tuning have went into make them work well.
By then end of the talk, you’ll know the basic of Z3’s Python API, how to formulate the shift scheduling problem in propositional logic, and a slight idea of how it can be applied to solve similar problems such as conference scheduling, vehicle routing, bin packing, and more.
Audience are suggested to have a good understand of list comprehension and itertools since both will be heavily used throughout the talk. Ideally, some experience with predicate logic is preferred, though it is not strictly necessary. However, no domain knowledge is required.
- Side note: Z3 is an immensely powerful tool; this talk can only covers a tip of iceberg in terms of what it can do. Other interesting use of Z3 includes: software verification, program synthesis, exploit generation, code simplification, etc.
? 講者介紹 About Speaker - Shung-Hsi Yu ?
Kernel Engineer at SUSE working on BPF
#pycontw #pyconapac2022 #python #shiftschedule #algorithm #z3
Follow “PyCon Taiwan”
⭐️ Official Website: https://tw.pycon.org
⭐️ Facebook: https://www.facebook.com/pycontw
⭐️ Instagram: https://www.instagram.com/pycontw
⭐️ Twitter: https://twitter.com/PyConTW
⭐️ LinkedIn: https://www.linkedin.com/company/pycontw
⭐️ Blogger: https://pycontw.blogspot.com
...
https://www.youtube.com/watch?v=T6Q2fPnPgUU
Day 2, R2 11:45–12:15
Blackbox problem has been becoming a popular concern when applying machine learning in specific applications, like medical system, where a user is supposed to understand the behavior of the system. Collecting tons of data for training machine learning model is another headache especially when you newly create a system from scratch. In this talk, I introduce the data analysis approach called "Sparse Modeling" that can produce good results, even if the amount of data is small. Event Horizon Telescope project, capturing blackhole image, is one good example of this nature. It's also referred to as explainable since it can tell you which input features have a strong impact to result generated by a machine learning model. With the overview of the method, I'll show concrete code examples for common use cases like image analysis, using a Python library named spm-image.
Slides: https://speakerdeck.com/hacarus/getting-started-with-sparse-modeling-with-spm-image
Speaker: Takashi Someda
After getting his master’s degree in informatics at Graduate School of Kyoto University, he started his job at Sun Microsystems as an engineer.
For about 20 years in the software industry, he has experienced several roles like software developer, technical evangelist, and data scientist.
Now, as CTO of Hacarus, he is responsible for technical direction with strong passion toward building a creative, self-organized team like Pixer.
...
https://www.youtube.com/watch?v=KjPyDhbqzKE
Speaker: Lee Yang Peng
I developed and evaluated Analytics, a tool that analyses packet data to learn information about network protocol formats. Analytics attempts to discover constants and enumeration fields among packet data, while providing visualization to aid analysts. My experiments on fixed length protocol headers show that the heuristics implemented for Analytics in detecting constants and enumeration fields are mostly accurate. It has an average accuracy in detecting constants of 76.8% and an average accuracy in detecting enumeration fields of 88.6%. As Analytics consists of heuristics to detect the targeted fields in network traces, it can also be applied onto proprietary or unknown protocols.
From my talk, audience can learn about network security and its significance. Poor network security can result in vulnerabilities in an organization, which may result in commercial espionage, the leakage of company secrets, or the control of computers connected to the network to perform illegal activities. Audience can also benefit from my talk by learning about Deep Packet Inspection, a common process used in large organizations to maintain network security and prevent the transfer or malicious data through a network. Experts in the field can appreciate the tool, 'Analytics', that demonstrates the use of Python in garnering information about unknown network protocol formats.
About the speaker
I'm a 16 year old student from Dunman High School
組織/公司 Dunman High School
頭銜 Student
...
https://www.youtube.com/watch?v=7qsixKitI18