Published By
Created On
6 Aug 2023 15:33:33 UTC
Transaction ID
Cost
Safe for Work
Free
Yes
More from the publisher
how-to-clean-data-part-2
ACCESS the FULL COURSE here: https://academy.zenva.com/product/data-science-mini-degree/?zva_src=youtube-datascience-md
TRANSCRIPT
So here we are in Spyder, and like I mentioned, we have smaller data cleaning tasks that we can do and some other pandas functionality that I want to show you guys. So in particular, one interesting thing that you might note is if I run flights.dtypes again just to view all of our data types here, you notice that, before, we converted the flight date to the actual date, but you'll notice that we also have a YEAR, MONTH, and DAY_OF_MONTH columns. So, all of those columns exist, but they're all already incorporated into this FL_DATE, this flight date column. So, these columns don't seem to really be necessary. In fact, you could really do either/or: You can either say, well, let's keep one column, flight date, or let's keep three columns: YEAR, MONTH, DAY_OF_MONTH.
So this might be useful if you're doing any kind of analysis that involves looking at each of these individually, then it might be useful to keep these columns around. But if you're just looking at a more holistic kind of analysis that just deals with these dates, then you might as well just get rid of these columns and keep this one column so that you're saving space and you don't have to deal with... You have fewer columns that you can work with. So, I'll show you how to... We can drop columns, so what we're gonna do is remove these three columns. So, I'll show you how to do that. So what we'll do here is... Here, what we want to do is remove columns. We want to remove YEAR, MONTH, and DAY_OF_MONTH, because that information is already encapsulated into this FL_DATE. So to drop columns, all we have to do is say flights.drop and then here we say columns=, then we give it a list of columns that we want to drop. It's quite easy to do. And one other additional thing that we'll do is I'll say inplace=True, and what this does is it will actually modify this DataFrame directly. What we're doing here, this astype, if I just do numpy.astype or to_datetime that doesn't actually modify the flight's data type. It returns a new DataFrame or a new column and we set that appropriately. So that's why we have to do this flights['FL_DATE'] = this, because this, in itself, doesn't actually change the column. It just makes a modification and gives you the result back. However, there are some cases where I can set inplace=True and now this will actually modify this DataFrame. So now I don't have to say something like flights = whatever, I can just say flights.drop and then inplace=True, and that will actually change that DataFrame. Okay, so there are three columns that I wanna drop. I need to drop YEAR, DAY_OF_MONTH, and MONTH. We'll do YEAR, and then MONTH, and then DAY_OF_MONT
...
https://www.youtube.com/watch?v=cQG8a8o1udw
Transaction
Created
1 year ago
Content Type
Language
video/mp4