There are many points where we can use data science and exploratory data analysis for directing our lives, saving time, increasing our profitability-incomes etc. We can do this by cleaning the data sets which have consisting of crowded and unnecessary informations. Then we can make it useful for our business. Finally we can obtain statistical informations from this cleared data sets and draw many conclusions or automate it with machine learning and artificial intelligence.
The Scenario of Project
Let’s imagine that we are at the beginning of 2016. My employer asked me to identify the most punctual domestic airlines in the US because his travel schedule will be quite busy and he does not want to wait even a minute for delays and cancellations. Thus, airlines that I detected as the most punctual ones will be preferred to buy tickets primarily.
Or let’s assume there is a famous traveller who has at least 2 flights in a day. Traveller wants to earn more miles to reduce flight costs. At the same time, he/she wants to choose airlines with the least cancellation, because traveller wants to buy tickets much earlier with less price. So he/she will be able to get large amount of miles because of flying with same airlines mostly.
I was checking open source datas on the web and found a dataset named as “2015 Flight Delays and Cancellations”.
Meeting with Dataset
After reading and analyzing the data set, I’ve wanted to see which columns it has and which ones I can use. At the same time, I’ve wanted to analyze how much data these columns have, how many of them contain nulls and what is columns values like min-max, std.
I’ve made the necessary arrangements on the columns that affect the flight times, cancellations and delays of the airlines. After integrating these values with the annual flight numbers, I’ve created rates for each airline company. Considering all the important datas and values, we could see a ranking table with this rates.
I also wanted to use the day and month values in the dataset. In this way, evaluating the performance of airlines according to the seasons(climate conditions) category will give us a more detailed and strong results. To detail my project results more, I created a new column named ‘CLIMATE’ and wrote a code assigning climates for each row based on condition of ‘MONTH’ values;
For stronger and more detailed results, I’ve used groupby tables with the climate catagories. Then with new rates I sorted airlines by seasons.
Data Conclusions and Visual Results
Of course the most effective way to submit a result of analysis is to make it to visual expression. Matplotlib and Seaborn libraries are quite useful to help us for that.
There are results of my analysis about delayed&cancelled flights rates of each airlines and climates by barcharts below:
To sum up, the most three punctual airlines are same for all climates but their sort are changing.
Airlines performance of timing for all year;