An exploration project of bus data from Data.Gov.HK and GTFS
Recently, I found out that Google Map does not show the best route observation and therefore, I did some digging and wrote up this project page on some of findings and work. Indeed, the Hong Kong government has grouped all transport data into their own API or database files on Data.Gov.HK.
This project was in separated into 2 parts mainly.
The bus data from from the Data.Gov.HK was mapped to the Google Map’s GTFS format.The script used can be found from the link. After mapping the data to a certain, the GTFS viewer can be used to view the results and a screenshot is shown below.
From the data conversion exercise, the following points were learnt.
In the long term, I believe it will be highly beneficial for Hong Kong to maintain its own GTFS data format as a public data set. This will allow different applications like Google Map to have the best route information in order to advice the best travel route for customers. In addition, the experience from Google Map has established a great file format, which would include much more information that what the data.gov.HK has published at the moment.
With the bus data in hand, I also plotted out some interesting statistics.
From the graph below, it shows that the median of each company is about 30 stations per bus route. Strangely, the largest number of bus stops per bus route for KMB has about 150 stations, which means around 75 stations per trip.
The distance between bus stops is calculated using the inverse function in the pyproj library. This already includes the correction from longitude and latitude to distance on ground. Interestingly. The median distance between bus stop for each company is about 400m. Using 10 minute per km walking speed, this would take about 4-5 minutes for a normal person to walk from one bus stop to another.
The fare per bus stop is calculated by the full fare divided by the number of bus stops in the trip. The discounted price is the middle is not included in the calculation.
The median value is about $0.5 HKD per bus stop. This shows that each company has a very similar pricing strategy, when compared to their competitors. The most expensive fare per bus stop is NA21, which gives about $5.35 HKD per bus stop.
It is great that the HK government has taken the first step to open up the bus data with the data.gov.HK website and APIs. I believe the openness of data will help application developers to make better applications for building future “Smart Cities”.
I believe that the following points still need to be tackled.
For your information, this article shows that one of our legislative council members, Charles Mok has already raised the issue of following GTFS format back in 2017 already. Hope that the Hong Kong government can open up more data with the best format possible. This will enable application developers to develop applications that make people’s life more convenient and more efficient.