{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Tabular Read-Write\n", "\n", "This notebook is an introduction to the PAM tabular read-write methods. It has two parts:\n", "\n", "1. [Read](#read-tabular-format)\n", "2. [Write](#write-tabular-data)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2020-05-18T14:31:06.631005Z", "start_time": "2020-05-18T14:31:06.133057Z" }, "execution": { "iopub.execute_input": "2023-09-22T10:52:12.682460Z", "iopub.status.busy": "2023-09-22T10:52:12.682315Z", "iopub.status.idle": "2023-09-22T10:52:13.624391Z", "shell.execute_reply": "2023-09-22T10:52:13.624086Z" } }, "outputs": [], "source": [ "import os\n", "\n", "import pandas as pd\n", "\n", "from pam import read" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Read Tabular Format\n", "\n", "PAM can read from either tabular or MATSim formats. Tabular formats use the `pam.read.load_travel_diary` function, which will try to automatically infer trips and activities from commonly formatted travel diary data.\n", "\n", "Tabular data should include a trips table and then optionally, atributes tables for persons and/or households. Tabular data is expected as pandas DataFrames with column names as described in the docs and/or as in the following example.\n", "\n", "The following demonstration data is available in the [`data/example_data`](https://github.com/arup-group/pam/tree/main/examples/data/example_data) directory. All data paths in this example are relative to the [notebook directory](https://github.com/arup-group/pam/tree/main/examples) in the PAM repository.\n", "\n", "#### Step 1\n", "\n", "Load your trips (and attributes) data into pandas DataFrames. Reformat and rename the columns as required (please read the docs). The following example already has the required data types and column names:\n", "\n", "**trips:**\n", "\n", "Each row represents a trip, where:\n", "\n", "- **pid**: person id of trip\n", "- **hid**: household id of trip (**optional**)\n", "- **seq**: sequence of trip within day (optional if order is already correct)\n", "- **hzone**: home zone of person (**optional**)\n", "- **ozone**: origin zone of trip\n", "- **dzone**: destination zone of trip\n", "- **purp**: purpose of trip (note that other ways of classifying purpose are supported - read the docs!)\n", "- **mode**: trip mode\n", "- **tst**: (integer) trip start time in minutes from start of day (typically from midnight)\n", "- **tet**: (integer) trip end time as above\n", "- **freq**: sample weighting (**optional**)\n", "\n", "**persons:**\n", "\n", "Each row represents a persons attributes. These can be arbitrary key - value pairs, with most types supported. The following are examples:\n", "\n", "- **pid**: person id, must be consistent with trips data (**required**)\n", "- gender: gender of person (example)\n", "- job: employment status of person (example)\n", "- occ: employment type of person (example)\n", "- inc: income of person (example)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2020-05-18T13:40:08.588305Z", "start_time": "2020-05-18T13:40:08.557000Z" }, "execution": { "iopub.execute_input": "2023-09-22T10:52:13.626292Z", "iopub.status.busy": "2023-09-22T10:52:13.626133Z", "iopub.status.idle": "2023-09-22T10:52:13.637252Z", "shell.execute_reply": "2023-09-22T10:52:13.636957Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pidhidseqhzoneozonedzonepurpmodetsttetfreq
uid
0census_0census_00HarrowHarrowCamdenworkpt4444731000
1census_0census_01HarrowCamdenHarrowworkpt8909191000
2census_1census_10GreenwichGreenwichTower Hamletsworkpt5075281000
3census_1census_11GreenwichTower HamletsGreenwichworkpt106510861000
4census_2census_20CroydonCroydonCroydonworkpt4224251000
5census_2census_21CroydonCroydonCroydonworkpt9179201000
6census_3census_30HaringeyHaringeyRedbridgeworkpt4284471000
7census_3census_31HaringeyRedbridgeHaringeyworkpt100710261000
8census_4census_40HounslowHounslowWestminster,City of Londonworkcar4835161000
9census_4census_41HounslowWestminster,City of LondonHounslowworkcar101710501000
\n", "
" ], "text/plain": [ " pid hid seq hzone ozone \\\n", "uid \n", "0 census_0 census_0 0 Harrow Harrow \n", "1 census_0 census_0 1 Harrow Camden \n", "2 census_1 census_1 0 Greenwich Greenwich \n", "3 census_1 census_1 1 Greenwich Tower Hamlets \n", "4 census_2 census_2 0 Croydon Croydon \n", "5 census_2 census_2 1 Croydon Croydon \n", "6 census_3 census_3 0 Haringey Haringey \n", "7 census_3 census_3 1 Haringey Redbridge \n", "8 census_4 census_4 0 Hounslow Hounslow \n", "9 census_4 census_4 1 Hounslow Westminster,City of London \n", "\n", " dzone purp mode tst tet freq \n", "uid \n", "0 Camden work pt 444 473 1000 \n", "1 Harrow work pt 890 919 1000 \n", "2 Tower Hamlets work pt 507 528 1000 \n", "3 Greenwich work pt 1065 1086 1000 \n", "4 Croydon work pt 422 425 1000 \n", "5 Croydon work pt 917 920 1000 \n", "6 Redbridge work pt 428 447 1000 \n", "7 Haringey work pt 1007 1026 1000 \n", "8 Westminster,City of London work car 483 516 1000 \n", "9 Hounslow work car 1017 1050 1000 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips = pd.read_csv(\n", " os.path.join(\"data\", \"example_data\", \"example_travel_diaries.csv\"), index_col=\"uid\"\n", ")\n", "persons = pd.read_csv(\n", " os.path.join(\"data\", \"example_data\", \"example_attributes.csv\"), index_col=\"pid\"\n", ")\n", "trips.head(10)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-09-22T10:52:13.638783Z", "iopub.status.busy": "2023-09-22T10:52:13.638688Z", "iopub.status.idle": "2023-09-22T10:52:13.642330Z", "shell.execute_reply": "2023-09-22T10:52:13.642036Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
genderjoboccinc
pid
census_0femaleworkwhitelow
census_1femaleworkwhitelow
census_2maleworkbluehigh
census_3maleworkbluelow
census_4maleworkbluemedium
census_5othereducationwhitemedium
census_6femaleworkbluelow
census_7maleeducationwhitehigh
census_8femaleworkbluemedium
census_9femaleworkwhitelow
\n", "
" ], "text/plain": [ " gender job occ inc\n", "pid \n", "census_0 female work white low\n", "census_1 female work white low\n", "census_2 male work blue high\n", "census_3 male work blue low\n", "census_4 male work blue medium\n", "census_5 other education white medium\n", "census_6 female work blue low\n", "census_7 male education white high\n", "census_8 female work blue medium\n", "census_9 female work white low" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "persons.head(10)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 2:\n", "\n", "Load the travel diary data:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2020-05-18T13:40:17.617442Z", "start_time": "2020-05-18T13:40:08.632419Z" }, "execution": { "iopub.execute_input": "2023-09-22T10:52:13.646304Z", "iopub.status.busy": "2023-09-22T10:52:13.646136Z", "iopub.status.idle": "2023-09-22T10:52:13.689782Z", "shell.execute_reply": "2023-09-22T10:52:13.689482Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using tour based purpose parser (recommended)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Adding pid->hh mapping to persons_attributes from trips.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Adding home locations to persons attributes using trips attributes.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Using freq of 'None' for all trips.\n" ] } ], "source": [ "population = read.load_travel_diary(trips, persons, trip_freq_as_person_freq=True)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2020-04-09T14:32:22.432201Z", "start_time": "2020-04-09T14:32:15.568791Z" } }, "source": [ "#### Step 3:\n", "\n", "Check everything is as expected. PAM will try to infer activities from trip data, including for arbitrarily complex sequences of nested tours.\n", "\n", "However, trip purpose can be encoded in a variety of ways. PAM will try to make sensible inference based on the data provided. If something looks wrong then check the docs, then consider raising an issue. The team are keen to support you!" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2020-05-18T13:40:17.640594Z", "start_time": "2020-05-18T13:40:17.621499Z" }, "execution": { "iopub.execute_input": "2023-09-22T10:52:13.691340Z", "iopub.status.busy": "2023-09-22T10:52:13.691257Z", "iopub.status.idle": "2023-09-22T10:52:13.693510Z", "shell.execute_reply": "2023-09-22T10:52:13.693223Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Person: census_12\n", "{'gender': 'female', 'job': 'education', 'occ': 'white', 'inc': 'high', 'hzone': 'Croydon'}\n", "0:\tActivity(act:home, location:Croydon, time:00:00:00 --> 07:06:00, duration:7:06:00)\n", "1:\tLeg(mode:pt, area:Croydon --> Tower Hamlets, time:07:06:00 --> 07:45:00, duration:0:39:00)\n", "2:\tActivity(act:education, location:Tower Hamlets, time:07:45:00 --> 15:54:00, duration:8:09:00)\n", "3:\tLeg(mode:pt, area:Tower Hamlets --> Croydon, time:15:54:00 --> 16:33:00, duration:0:39:00)\n", "4:\tActivity(act:home, location:Croydon, time:16:33:00 --> 00:00:00, duration:7:27:00)\n" ] } ], "source": [ "household = population.households[\"census_12\"]\n", "person = household.people[\"census_12\"]\n", "person.print()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2023-09-22T10:52:13.694877Z", "iopub.status.busy": "2023-09-22T10:52:13.694784Z", "iopub.status.idle": "2023-09-22T10:52:13.800437Z", "shell.execute_reply": "2023-09-22T10:52:13.800168Z" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "person.plot()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Write Tabular Data\n", "\n", "PAM can write into a preferred tabular formats using `pam.write.to_csv`. This outputs trip legs, household attributes and person attributes tables. Where sufficient geometries are found, PAM will write spatial data as geojson." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2020-05-18T13:40:19.159176Z", "start_time": "2020-05-18T13:40:19.133774Z" }, "execution": { "iopub.execute_input": "2023-09-22T10:52:13.801863Z", "iopub.status.busy": "2023-09-22T10:52:13.801765Z", "iopub.status.idle": "2023-09-22T10:52:13.834658Z", "shell.execute_reply": "2023-09-22T10:52:13.834253Z" } }, "outputs": [], "source": [ "from pam import write\n", "\n", "write.to_csv(population, dir=\"tmp\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PAM can also write directly to O-D matrices using `pam.write.write_od_matrices`. This can optionally be segmented (read the docs). But does not currently support trip weighting (frequency).\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2023-09-22T10:52:13.836946Z", "iopub.status.busy": "2023-09-22T10:52:13.836776Z", "iopub.status.idle": "2023-09-22T10:52:13.851430Z", "shell.execute_reply": "2023-09-22T10:52:13.850697Z" } }, "outputs": [], "source": [ "write.write_od_matrices(population, \"tmp\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Pickle\n", "\n", "Not a tabular format but if you've read this far - you might like to know that there is a Population.pickle method:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2020-05-18T13:40:23.298011Z", "start_time": "2020-05-18T13:40:22.714290Z" }, "execution": { "iopub.execute_input": "2023-09-22T10:52:13.853673Z", "iopub.status.busy": "2023-09-22T10:52:13.853534Z", "iopub.status.idle": "2023-09-22T10:52:13.860152Z", "shell.execute_reply": "2023-09-22T10:52:13.859767Z" } }, "outputs": [], "source": [ "population.pickle(os.path.join(\"tmp\", \"population.pickle\"))" ] } ], "metadata": { "kernelspec": { "display_name": "pam", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": true, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "248.333px" }, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }