AIRLINE DIGITAL CLICK STREAM EVENT PROCESSING FOR ENRICHING THE AIRLINE BUSINESS

The new era of digital world with the rapid expansion of social network and mobile applications created wider scope to expand airline industry for new way of promoting their business. Due to several social media and other digital platforms, we need to emphasize on target marketing/customer profiling. Hence, to do target marketing, a new web technology is created to collect each of the raw events of their web data and mobile app data for tracking the way user is searching flights. In the proposed method BigQuery is used to process huge volume of online customers’ data. The proposed method is to understand the airline ecommerce online visitors effectively by analysing the event data stream collected from various digital properties. The obtained raw digital data consists of lot information with a semi-structured and it needs to be cleansed before analysing it. So, the first stage of proposed system is to extract the data from various digital sources in real-time, then chose which data is appropriate for analysing and finally extract the key insights to improve the airline business. From the extracted variables, search patterns, the predictive models such as flight search forecast, seat sales forecast and digital channel attribution models can be developed.


INTRODUCTION
In recent years, most of the Asian airlines prime focus is on digital transformation (O'Connell & Williams, 2005). The prime objectives of digital transformation are to understand the online customer acquisition, digital channel attribution, online customer segmentation, and their search trend. These are the most important techniques to take right business action at right time to increase revenue. Most of the airline industries have their own online and mobile based ecommerce platform, it is possible to track and record their activities on the webpage as from which webpage they have entered, when and what they search, where they drop off, what they purchase, how frequently they book etc., (Klein & Loebbecke, 2000). These visitor data can be for customer analytics like online customer profile, sales funnel to understand at which point visitors drop off, are they price sensitive or not.
However, tracking and processing visitors' raw events from the website logs data is complicated because of the large volume of hit level data (One of the major Asian airlines has about 15 million of online visitors per month, which generates roughly 3-5 billion events of unstructured or semi-structured web tracking data) (Ananthi, 2014). In this paper, the online digital click stream dataset is obtained from one of the major Asian Airline system with 50 destinations. Each route is tracked with one way and return flights for 30 days to 120 days. This paper mainly focus more on the real-time digital data collection and pre-processing of the dataset for flight sales prediction. The overall objective of the proposed work is that, the key variables are selected from the extracted digital click stream data is to improve the airline business.

LITERATURE REVIEW
The growth of Internet around the world made airline business to change their way of attracting the passengers (Singh & Jain, 2014). Also this digital era made to buy tickets from anywhere in the globe at any time by comparing the different airlines. So it is becoming very difficult to predict the ticket prices and attracting the passengers becoming difficult with the influence of many factors (Gillen & Lall, 2004). However, data science showed a way to progress in this type of scenarios to study the patterns and predict the behavior of the sales outcome. For example, it can be identifying the correlation between seat prices of particular airlines and air traffic delays. As per recent surveys of (Forbes, 2008), it is noticed that for every minute of flight delay it will affect the ticket prices about $1.5. Low cost airlines offer ticket pricing without the baggage, food and beverages, which gives privileges to afford all common people (Groves & Gini, 2013 Based on the various studies on the airline business, the most important aspects to buy tickets online in advance according to the user's observation and their risk (Etzioni, Tuchinda, Knoblock & Yates, 2003). The user who purchases their tickets online should have a sense of control over the task they are performing over the Internet. This helps to reduce the feeling of risk or fear associated with the possibility of: making a mistake when making an airline booking online (that is, psychological risk); not receiving their ticket or the flight not even existing (performance risk) (Brons, Pels, Nijkamp & Rietveld, 2002). Several research papers described the promotions on ticket prices, gift vouchers, airline points and upgrades, which playing indirectly to attract the customers (Barrett, 2004;Gillen, & Lall, 2004). The majority of these studies conclude that the incentives employed have a positive effect on airline ticket purchase and repeat purchase and highlight that the effectiveness of the program depends to a large extent on the particular incentive offered (Aviasales, n.d.). The literature regarding the choice of Airlines has made it clear that both the benefits provided by frequent flyer programs and air fares significantly affect user's choices (Groves & Gini, 2013). Users who travel for business perceive the frequent flyer programs as more useful than other users. These authors even guarantee that business travelers are willing to pay more in exchange for reducing access time, traveling with top-ranked airlines, and traveling in a better class (O'Connell & Williams, 2005; Sabre, 2015).

IMPLEMENTATION OF DIGITAL EVENT DATA PROCESSING
In recent years, most of the people in the world entered towards digital era, which increases the ecommerce transactions in a vast manner compared to the offline. Also the power of digital world made people to reach the world from anywhere any time through either social media, travel blogs or meta search engine. With these available resources, the traveller's can see different travel websites, travel blogs for price comparison before they book their flight tickets. This open lot of opportunity for the airlines to track the travellers search patterns and predict passengers' behaviour using predicting models. Besides, it is also possible to find which online channel is more effective for which airline routes and geo location for predicting the cost per acquisition, which in turn save lot of advertisement costs. Further, the successful tracking of all the digital data also enable the airlines to build sales funnel of digital products, customer life time value calculation and other predictive modelling for digital marketing.

DATA COLLECTION
To collect the online digital data and analyze its patterns, five types of variables are considered for better prediction of seat sales, which are: • Visitor.
The transactional, operational data are extracted using various channels such as web, mobile and tablets in the year 2016. The collection of digital data in real-time is so complicated process, but with the evolution of Java scripts tagging framework, it is possible to track each web page and its components based on visitor status on the internet. The passenger activities such as which page they search, how much time they spent on each webpage, how many clicks and scrolls on each page etc. Also, the ecommerce related information such as add to cart, product related information and ecommerce transaction details etc. As the flight sales digital web data is very big and complex, the data collected, cleansed and processed using cloud technology. The implementation of digital analytics will help marketing to monitor the load factor (%) for future flights and how traveler is choosing origin hub to destination hub and other connecting hubs using fly through (transit). Figure 1 shows the detailed block diagram of the airline data collection from various sources and its predictive model.  gives the details of the airline customers searching patterns. From the webserver log the customers details (such as, computer info, the Location, hostname, the browser type, and language they are browsing etc.,) are extracted.
In the proposed research, BigQuery is used to process high volume of customers' digital data. BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage. It is an Infrastructure as a Service (IaaS) that may be used complementarily with Map Reduce. BigQuery is used to process the raw data to further level. After exporting each digital properties as raw tables, which are available in BigQuery as multiple daily tables. BigQuery uses SQL syntax to process the raw data. Figure 2 shows the airline flight search data processing flow. Figure 3 shows the airline online traffic and search data processing flow from all airline digital properties in a daily aggregation.
After tracking for capturing the web and mobile digital properties and the listed attributes, the captured data is exported to BigQuery on a periodic basis. In general, the open source tracking code retrieves web page data as follows: • A browser requests a web page that contains the tracking code.
• A JavaScript Array is created and tracking commands are pushed onto the array.
• A <script> element is created and enabled for asynchronous loading (loading in the background).
• The ga.js tracking code is fetched, with the appropriate protocol automatically detected. Once the code is fetched and loaded, the commands on the array are executed and the array is transformed into a tracking object. Subsequent tracking calls are made directly to the server.
• Loads the script element to the DOM.
• After the tracking code collects data, the GIF request is sent to the analytics database for logging and post-processing.
A GIF request can be classified into few types. Table 1 shows various types of GIF request. In each of these cases, the GIF request is identified by type in the utmt parameter. In addition, the type of the request also determines which data is sent to the Analytics servers. For example, transaction and item data are only sent to the Analytics servers when a purchase is made. User, page, and system information is only sent when an event is recorded or when a page loads and the user-defined value is only sent when the _setVar method is called.    Figure 4 shows the BigQuery processing flow to predict the sector levels.

Digital data aggregation
With the clean, structured and quality data produced after data cleansing, enrichment and transformation, aggregation can now be performed to get desired data set.
Algorithm 1 shows the high-level process of aggregating the digital data. All digital platform (Web/Mobile/Tablet) data has been merged to make a one single data source. Since all the digital data are in the same structure, a UNION operation in BigQuery can merge multiple datasets of the same structure. This merged data table is named as 'clickStreamRecords'. Algorithm 3 takes this data as input. First step of the algorithm is to extract visitId and visitorId of the customer by hourly, daily, weekly and monthly basis and stored as D FlightSearch .
After that, data has been aggregated to get the no. of flight, no. of unique user perform flight search and no. of total search as well as group by each selected route (origin and destination), search date and departure date. Furthermore, searchlead-days have been calculated by subtracting search-date from departure-date.
This will compute how many days before the departure, customer searched for the flight. Output of this algorithm has been stored as D uniqVisitorByRoute , D uniqFlightSearchByRoute , and D NoOfFlightSearchByRoute . Aggregated final dataset sample has been shown in Table 2. The reports produced from final stage of aggregated dataset is shown in Figure 5. The digital airline website tracking analysis results shown in Figure 5 gives the summary of how many visitors visits each day and how many users log on to the website second time. Also, it shows how many numbers of sessions are in active, how long the user sessions were active. From these analysis, it is noticed that, based on the users searching patterns flight fares and seats could be decided. Few routes digital variable data have analyzed based on the seats sale using the correlation analysis and identified the best and worst routes, which is shown in Table 2. showed that meta search rate is higher in booking also. From these analysis it is observed that, users meta search is using to book flight seats. From all the digital variable data, transactional data and operational data, the seat sales have predicted, which is shown in Figure 6. The forecast results showed in Figure 6 are part of the analysis. From the graph shown in Figure 6 that, the predicted values almost 6.5 to 9% deviation from the actual values. To predict accurately, hybrid models with ANN and ARIMA models are going to be implemented in further research works.

CONCLUSION AND FUTURE WORK
In this paper, the main approach used for selecting the important variables in flight sales forecast of each day on the route level. In this, for events tracking and web data tracking Java script is used. From the digital click stream data, the most prominent five selected variables were extracted to find visitors traffic, flight search transactions, device data and channel data. These five selected variables data will be used to build models for predictive analytics such as Seat Sales Prediction, Revenue Optimization with Digital and Transaction data, Channel Attribution Model, Customer Life time value, which could bring tremendous business value. The proposed correlation analysis of the extracted variables, the model produced around 7% and 9% error rate when forecasting 30 days and 60 days ahead respectively. This paper discussed only the requirements and design constraints of the dynamic models. In our next paper, the dynamic predictive models will be described in detail with the suitable analysis results to predict the seat sales forecast dynamically according to the extracted real time digital data.