The Problem¶

You recently started Aviato.com, a startup that helps airlines set ticket prices.

Aviato's success will depend on a function called pricing_function. This notebook already includes a very simple version of pricing_function. You will modify pricing_function to maximize the total revenue collected for all flights in our simulated environment.

For each flight, pricing_function will be run once per (simulated) day to set that day's ticket price. The seats you don't sell today will be available to sell tomorrow, unless the flight leaves that day.

Your pricing_function is run for one flight at a time, and it takes following inputs:

Number of days until the flight
Number of seats they have left to sell
A variable called demand_level that determines how many tickets you can sell at any given price.

The quantity you sell at any price is:

quantity_sold = demand_level - price

Ticket quantities are capped at the number of seats available.

Your function will output the ticket price.

You learn the demand_level for each day at the time you need to make predictions for that day. For all days in the future, you only know demand_level will be drawn from the uniform distribution between 100 and 200. So, for any day in the future, it is equally likely to be each value between 100 and 200.

In case this is still unclear, some relevant implementation code is shown below.

The Simulator¶

We will run your pricing function in a simulator to test how well it performs on a range of flight situations. Run the following code cell to set up your simulation environment:

import seaborn as sb
import matplotlib.pyplot as plt
%matplotlib inline

import sys
sys.path.append('../input')
from flight_revenue_simulator import simulate_revenue, score_me

def pricing_function(days_left, tickets_left, demand_level):
    return searchForBestPrice(int(demand_level), int(tickets_left), int(days_left))

The Solution¶

In the following sections, we are proposing an efficiency & performance well ballanced optimization appraoch, by mean revenue estimation. It is not the best performing solution based on the Kaggle discussion so far, but much faster than the optimal solution.

At the end of this notebook, we have discussions on how to further improve the solution.

Find the maximal revenue for one day, given total available tickets¶

For daily demond $m$, we plan to sell $k$ tickets that maximize the one day revenue $y = (m - k)\ k$, W.R. $ k \leq n$

$$ \frac{d\ y}{d\ k} = m - 2k = 0$$$$ k^* = \begin{cases} \frac{m}{2}, & \mbox{if } m \leq 2n \\ n, & \mbox{if } m > 2n \end{cases} $$

Given $m$ is uniformly distributed in the range of $[100, 200]$, we derive the expected revenue for total $n$ tickets in one day, in following cases:

$$ \mbox{if } n \in [0, 50) \begin{cases} k & = n \\ p & = m - n \\ E(v) & = E((m - n) n / n) = E(m) - n = 150 - n \end{cases} $$

$$ \mbox{if } n \in [50, 100) \begin{cases} k & = \begin{cases} \frac{m}{2}, & \mbox{if } m \leq 2n \\ n, & \mbox{if } m > 2n \end{cases} \\ p & = m - k \\ v & = \frac{p \ k}{n} \\ & = \frac{(m \ k - k^2)}{n} \\ v & = \begin{cases} \frac{m^2}{4n}, & \mbox{if } m \leq 2n \\ m - n, & \mbox{if } m > 2n \end{cases} \\ E(v) & = P(m \leq 2n) \frac{E(m^2 | m \leq 2n)}{4n} + P(m > 2n) (E(m | m > 2n) - n) \\ & = \frac{n - 50}{50} \frac{ (50 + n)^2 + \frac{(n - 50)^2}{3} }{4n} + \frac{100 - n}{50} (100 + n - n) \\ & = \frac{n - 50}{50} \frac{ 3 (50 + n)^2 + (n - 50) ^ 2 } {12 n} + 2(100 - n) \\ & = (n - 50) \frac{ 3 (50 + n)^2 + (n - 50) ^ 2 } {600 n} + 200 - 2n \end{cases} $$

$$ \mbox{if } n \in [100, +\infty) \begin{cases} k & = \frac{m}{2} \\ p & = \frac{m}{2} \\ E(v) & = \frac{E(m^2)}{4n} \\ & = \frac{150^2 + \frac{100^2}{12}}{4n} \end{cases} $$

# assuming the daily demand is uniform distribution within [100, 200]
def calculateExpectedTicketSellingPrice(_ticketsCount):
    assert (_ticketsCount > 0)
    if _ticketsCount < 50:
        # case [0, 50)
        return 150 - _ticketsCount
    elif _ticketsCount < 100:
        # case [50, 100)
        return (_ticketsCount - 50) * (3 * (50 + _ticketsCount) ** 2 + (_ticketsCount - 50) ** 2) / (600 * _ticketsCount) \
            + 200 - 2 * _ticketsCount
    else:
        # case [100, \infty)
        return (150 * 150 * 12 + 100 * 100) / (48 * _ticketsCount)

# total revenue list for 200 tickets
totalRevenueList = [0] + [calculateExpectedTicketSellingPrice(i) * i for i in range(1, 201)]
print (calculateExpectedTicketSellingPrice(40) * 40)
print (calculateExpectedTicketSellingPrice(50) * 50)
print (calculateExpectedTicketSellingPrice(70) * 70)
print (calculateExpectedTicketSellingPrice(90) * 90)
print (calculateExpectedTicketSellingPrice(100) * 100)
print (calculateExpectedTicketSellingPrice(120) * 120)
print (calculateExpectedTicketSellingPrice(130) * 130)

Find the maximal revenue for N days, given total M tickets¶

To find the maximal revenue for N days, we are exploiting the one day revenue functions defined above. By plotting the revenue function bellow, we can clearly see its curvature as concave, which is excellent for optimization.

# Plot the one day revenue - tickets relationship
# It's a clear concave shape
fig_ = plt.figure()
fig_.suptitle("Total Revenue / Total Tickets")
plt.ylabel('total revenue')
plt.xlabel('total tickets')
plt.plot(totalRevenueList)

As such, for total m tickets, the total revenue over 2 days $R_2(m)$ follows

$$ R_2(m) = \sup_{x,y}(R_1(x) + R_1(y)) = 2 R_1(\frac{x + y}{2}) \\ \mbox{W.R. } x + y = m \\ x^* = y^* = \frac{m}{2} $$

Where the $x$, and $y$ refers to the tickets count in the first and second days.

By extending the above theory, for total M days, the total revenue reaches maximal when we divide the $m$ tickets evenly into $n$ days, which will be

$$R_n(m) = n\ R_1(\frac{m}{n})$$

Please pay attention to the integer rounding here as we cannot sell fraction of tickets.

def findTotalRevenueOverDays(_totalTickets, _totalDays):
    tickets_ = _totalTickets // _totalDays
    # for perday tickets size more than 200, there is no more changes on revenue, as the demand is upbounded
    if tickets_ > 199:
        tickets_ = 199
    overflowDays_ = _totalTickets - tickets_ * _totalDays
    return overflowDays_ * totalRevenueList[tickets_ + 1] + (_totalDays - overflowDays_) * totalRevenueList[tickets_]

#print(findTotalRevenueOverDays(500, 5))
#print(findTotalRevenueOverDays(500, 50))
print(findTotalRevenueOverDays(500, 800))

Find the maximal total expected revenue for a given day demand and left over days¶

The total revenue for $n$ days and $m$ tickets can be derived by

$$ R_{total}(m) = \sup_{0 < x < m}(R_{today}(x) + R_{n - 1}(m - x)) $$

Please note that $R_{today}(x)$ is not a expectation number, but something can be precisely derived by today's demand.

To optimize the total expected revanue, we will perform a linear search on the variable x, and find optimal revanue.

# calculate precise demand for one day
def calculateOnedayRevenue(_demands, _tickets):
    # R_today(_demands, t) = (_demands - t) * t
    # t* = _demands / 2
    if _tickets <= _demands / 2:
        return (_demands - _tickets) * _tickets, _demands - _tickets
    else:
        t = _demands // 2
        return (_demands - t) * t, _demands - t

print(calculateOnedayRevenue(140, 30))

# linear search for best price
def searchForBestPrice(_todayDemand, _totalTickets, _totalDays):
    if _totalDays == 1:
        return calculateOnedayRevenue(_todayDemand, _totalTickets)[1]
    
    rev_ = -1
    price_ = 1
    leftDays_ = _totalDays - 1
    for t in range(1, _totalTickets + 1):
        r, p = calculateOnedayRevenue(_todayDemand, t)        
        if _totalTickets > t and _totalDays > 1:
            r += findTotalRevenueOverDays(_totalTickets - t, _totalDays - 1)
        
        if r > rev_:
            rev_ = r
            price_ = p
    
    return price_#, rev_

print(searchForBestPrice(130, 500, 10))
print(searchForBestPrice(180, 500, 10))

Experiments and Future Works¶

By running the simulation and scoring bellow, we get total score of 6912, which is around 3ed position based on the Kaggle discussion board.

This solution is based on the sum of mean estimation of the future daily revenues, and use that to ballance today's price. However, this is not the optimal estimation approach, especially for days far away from launch.

An more optimal solution would involve the dynamic programming over the distribution of revenues of each day's revenue (instead of simple point estimation). However, it is way more costy than mean estimation thus not practical in the practice.

One alternative to the dynamic programming appraoch could be performing distribution computation for a fixed range of days, and use the sum of mean esitmation of that multi-day distribution to estimate future revenues. It will be a good trade-off between accuracy and performance.

simulate_revenue(days_left=7, tickets_left=50, pricing_function=pricing_function, verbose=True)

score_me(pricing_function)