PAM Read-Modify-Write¶

This notebook is an introduction to the basic read - modify - write use case of PAM:

Read: Load activity plans from existing data (either tabular or MATSim)
Modify: Use the PAM api to modify the activity plans
Write: Write activity plans back to disk in the chosen format

For this example, we use policies to make our modifications. But you might also try the following:

spatial sampling
location modelling
rescheduling
adding noise
simulating aging or the passing of time
and so on...

In [1]:

Copied!





import os
from collections import defaultdict

import geopandas as gp
import pandas as pd
from matplotlib import pyplot as plt
from pam import policy, read
from pam.policy import apply_policies

%matplotlib inline
import os
from collections import defaultdict

import geopandas as gp
import pandas as pd
from matplotlib import pyplot as plt
from pam import policy, read
from pam.policy import apply_policies

%matplotlib inline

Load Data¶

Here we load simple travel diary data of London commuters. This is a very simple 0.1% sample of data about work and education commutes from the 2011 census. Because we're sharing this data, we've aggregated locations to borough level and randomized personal attributes; so, don't get too excited about the results.

The data is available in the data/example_data sub-directory. All data paths in this example are relative to the notebook directory in the PAM repository

In [2]:

Copied!





trips = pd.read_csv(
    os.path.join("data", "example_data", "example_travel_diaries.csv"), index_col="uid"
)
attributes = pd.read_csv(
    os.path.join("data", "example_data", "example_attributes.csv"), index_col="pid"
)
trips = pd.read_csv(
    os.path.join("data", "example_data", "example_travel_diaries.csv"), index_col="uid"
)
attributes = pd.read_csv(
    os.path.join("data", "example_data", "example_attributes.csv"), index_col="pid"
)

In [3]:

Copied!

trips.head(10)
trips.head(10)

Out[3]:

	pid	hid	seq	hzone	ozone	dzone	purp	mode	tst	tet	freq
uid
0	census_0	census_0	0	Harrow	Harrow	Camden	work	pt	444	473	1000
1	census_0	census_0	1	Harrow	Camden	Harrow	work	pt	890	919	1000
2	census_1	census_1	0	Greenwich	Greenwich	Tower Hamlets	work	pt	507	528	1000
3	census_1	census_1	1	Greenwich	Tower Hamlets	Greenwich	work	pt	1065	1086	1000
4	census_2	census_2	0	Croydon	Croydon	Croydon	work	pt	422	425	1000
5	census_2	census_2	1	Croydon	Croydon	Croydon	work	pt	917	920	1000
6	census_3	census_3	0	Haringey	Haringey	Redbridge	work	pt	428	447	1000
7	census_3	census_3	1	Haringey	Redbridge	Haringey	work	pt	1007	1026	1000
8	census_4	census_4	0	Hounslow	Hounslow	Westminster,City of London	work	car	483	516	1000
9	census_4	census_4	1	Hounslow	Westminster,City of London	Hounslow	work	car	1017	1050	1000

Read¶

First we load example travel diary data to Activity Plans. This data represents 2011 baseline London population of commuters.

In [4]:

Copied!

population = read.load_travel_diary(trips, attributes, trip_freq_as_person_freq=True)
population = read.load_travel_diary(trips, attributes, trip_freq_as_person_freq=True)

Using tour based purpose parser (recommended)

Adding pid->hh mapping to persons_attributes from trips.

Adding home locations to persons attributes using trips attributes.

Using freq of 'None' for all trips.

Let's check out an example Activity Plan and Attributes:

In [5]:

Copied!

household = population.households["census_12"]
person = household.people["census_12"]
person.print()
household = population.households["census_12"]
person = household.people["census_12"]
person.print()

Person: census_12
{'gender': 'female', 'job': 'education', 'occ': 'white', 'inc': 'high', 'hzone': 'Croydon'}
0:	Activity(act:home, location:Croydon, time:00:00:00 --> 07:06:00, duration:7:06:00)
1:	Leg(mode:pt, area:Croydon --> Tower Hamlets, time:07:06:00 --> 07:45:00, duration:0:39:00)
2:	Activity(act:education, location:Tower Hamlets, time:07:45:00 --> 15:54:00, duration:8:09:00)
3:	Leg(mode:pt, area:Tower Hamlets --> Croydon, time:15:54:00 --> 16:33:00, duration:0:39:00)
4:	Activity(act:home, location:Croydon, time:16:33:00 --> 00:00:00, duration:7:27:00)

Before we do any activity modification - we create a simple function to extract some example statistics. We include this as a simple demo, but would love to add more.

Note that activity plans allow us to consider detailed joint segmentations, such as socio-economic, spatial, temporal, modal, activity sequence and so on.

In [6]:

Copied!





def print_simple_stats(population):
    """Print some simple population statistics."""
    time_at_home = 0
    travel_time = 0
    low_income_central_trips = 0
    high_income_central_trips = 0

    for hh in population.households.values():
        for person in hh.people.values():
            freq = person.freq

            for p in person.plan:
                if p.act == "travel":
                    duration = p.duration.seconds * freq / 3600
                    travel_time += duration

                    if p.end_location.area == "Westminster,City of London":
                        if person.attributes["inc"] == "low":
                            low_income_central_trips += freq

                        elif person.attributes["inc"] == "high":
                            high_income_central_trips += freq

                else:  # activity
                    if p.act == "home":
                        duration = p.duration.seconds * freq / 3600
                        time_at_home += duration

    print(f"Population total time at home: {time_at_home/1000000:.2f} million hours")
    print(f"Population total travel time: {travel_time/1000000:.2f} million hours")
    print(f"Low income trips to Central London: {low_income_central_trips} trips")
    print(f"High income trips to Central London: {high_income_central_trips} trips")
def print_simple_stats(population):
    """Print some simple population statistics."""
    time_at_home = 0
    travel_time = 0
    low_income_central_trips = 0
    high_income_central_trips = 0

    for hh in population.households.values():
        for person in hh.people.values():
            freq = person.freq

            for p in person.plan:
                if p.act == "travel":
                    duration = p.duration.seconds * freq / 3600
                    travel_time += duration

                    if p.end_location.area == "Westminster,City of London":
                        if person.attributes["inc"] == "low":
                            low_income_central_trips += freq

                        elif person.attributes["inc"] == "high":
                            high_income_central_trips += freq

                else:  # activity
                    if p.act == "home":
                        duration = p.duration.seconds * freq / 3600
                        time_at_home += duration

    print(f"Population total time at home: {time_at_home/1000000:.2f} million hours")
    print(f"Population total travel time: {travel_time/1000000:.2f} million hours")
    print(f"Low income trips to Central London: {low_income_central_trips} trips")
    print(f"High income trips to Central London: {high_income_central_trips} trips")

In [7]:

Copied!

print_simple_stats(population)
print_simple_stats(population)

Population total time at home: 0.76 million hours
Population total travel time: 0.03 million hours
Low income trips to Central London: 3000 trips
High income trips to Central London: 4000 trips

In [8]:

Copied!





def plot_simple_stats(population):
    """Plot some simple population statistics."""
    geoms = gp.read_file(os.path.join("data", "example_data", "geometry.geojson"))

    departures = defaultdict(int)
    arrivals = defaultdict(int)

    for _hid, hh in population.households.items():
        for _pid, person in hh.people.items():
            freq = person.freq

            for p in person.plan:
                if p.act == "travel":
                    departures[p.start_location.area] += freq
                    arrivals[p.end_location.area] += freq
    geoms["departures"] = geoms.NAME.map(departures)
    geoms["arrivals"] = geoms.NAME.map(arrivals)

    fig, ax = plt.subplots(1, 2, figsize=(16, 6))
    for i, name in enumerate(["departures", "arrivals"]):
        ax[i].title.set_text(name)
        geoms.plot(name, ax=ax[i])
        ax[i].axis("off")
def plot_simple_stats(population):
    """Plot some simple population statistics."""
    geoms = gp.read_file(os.path.join("data", "example_data", "geometry.geojson"))

    departures = defaultdict(int)
    arrivals = defaultdict(int)

    for _hid, hh in population.households.items():
        for _pid, person in hh.people.items():
            freq = person.freq

            for p in person.plan:
                if p.act == "travel":
                    departures[p.start_location.area] += freq
                    arrivals[p.end_location.area] += freq
    geoms["departures"] = geoms.NAME.map(departures)
    geoms["arrivals"] = geoms.NAME.map(arrivals)

    fig, ax = plt.subplots(1, 2, figsize=(16, 6))
    for i, name in enumerate(["departures", "arrivals"]):
        ax[i].title.set_text(name)
        geoms.plot(name, ax=ax[i])
        ax[i].axis("off")

In [9]:

Copied!

plot_simple_stats(population)
plot_simple_stats(population)

No description has been provided for this image

Modify¶

Our 2011 baseline London population of commuters seems sensible, they spend about 50 million hours at home and 1.6 million hours travelling.

But what if we want to try and build some more up to date scenarios?

We consider two scenarios from a combination of policies:

Scenario A - Do Minimum:

A household will be quarantined with p=0.025 (for example due to a possitive virus test within the household)
A person will be staying at home (self isolating) with p=0.1 (for example due to being a vulnerable person)

Scenario B - Lockdown:

As above plus education and work activities will be removed and plans adjusted with p=0.9 (for example because schools and work places are closed)

In [10]:

Copied!





policy1 = policy.HouseholdQuarantined(probability=0.025)
policy2 = policy.PersonStayAtHome(probability=0.1)
policy3 = policy.RemoveHouseholdActivities(["education", "work"], probability=0.9)

do_minimum = apply_policies(population, [policy1, policy2])
lockdown = apply_policies(population, [policy1, policy2, policy3])
policy1 = policy.HouseholdQuarantined(probability=0.025)
policy2 = policy.PersonStayAtHome(probability=0.1)
policy3 = policy.RemoveHouseholdActivities(["education", "work"], probability=0.9)

do_minimum = apply_policies(population, [policy1, policy2])
lockdown = apply_policies(population, [policy1, policy2, policy3])

In [11]:

Copied!

print_simple_stats(do_minimum)
plot_simple_stats(do_minimum)
print_simple_stats(do_minimum)
plot_simple_stats(do_minimum)

Population total time at home: 0.67 million hours
Population total travel time: 0.02 million hours
Low income trips to Central London: 3000 trips
High income trips to Central London: 4000 trips

In [12]:

Copied!

print_simple_stats(lockdown)
plot_simple_stats(lockdown)
print_simple_stats(lockdown)
plot_simple_stats(lockdown)

Population total time at home: 0.03 million hours
Population total travel time: 0.00 million hours
Low income trips to Central London: 1000 trips
High income trips to Central London: 0 trips

Write¶

Assuming we are happy with our modified activity sequences we can write them to disk in our desired format. For this example we haven't prepared the population for MATSim so we write to disk as travel plans/diaries:

In [13]:

Copied!

do_minimum.to_csv(os.path.join("tmp", "do_min"))
lockdown.to_csv(os.path.join("tmp", "lockdown"))
do_minimum.to_csv(os.path.join("tmp", "do_min"))
lockdown.to_csv(os.path.join("tmp", "lockdown"))