Why Geospatial Data is Stuck in the Year 1955

If we take a look back in history, actually just 66 years ago to 1956 it used to be a very different world.

It was in 1956 that the humble shipping container was invented by the American entrepreneur, Malcolm McLean¹. Prior to this era ships used to be loaded with all manner of boxes, wooden crates, containers and sacks. It could take weeks to load or unload a ship. Fast forward to today and one of those giant container ships can be loaded or unloaded in as little as 24 hours.

Malcolm McLean in 1957 – “The Father of Containerization”
Credit: Wikimedia

The most significant advantage of Mclean’s container was that it allowed cargo to be seamlessly transported by sea, road or rail. There was no need to unload the contents of a container from one vessel to another, you simply moved the container directly from the ship to the railroad car or to a truck, and bingo, you were on your way.

As a result of all this shipping costs plummeted and it was an easy decision for the industry to switch to using these containers. Use of the 20’x8’x8’ steel boxes² quickly became a viral success and suddenly all new commercial ships were designed around these standard dimensions.

The icing on the cake came in 1968 when the International Standards Association (ISO) adopted McLean’s specifications and it became an official global standard.

The Economist magazine has since called it “The Humble Hero”:

“Although only a simple metal box, it has transformed global trade. In fact, new research suggests that the container has been more of a driver of globalization than all trade agreements in the past 50 years taken together.”

A Modern Day Container Ship — Credit: Wikimedia

In summary two very important things happened here:

A very useful and easily adherable standard was invented
The standard was broadly adopted across the entire globe

So how does this story relate to the geospatial world?

Well, I have to say that this world is unfortunately very comparable to the world of international shipping prior to McLean’s 1956 invention. We are still mostly in the year 1955.

Like the contents inside the ~20,000,000 shipping containers in use around the world there are mountains and mountains of geospatial data in use today. With the advent ever cheaper data collection devices, ever more powerful compute power and ever increasing storage capacity — all multiplied by the sheer demand for geospatial data — these mountains are growing exponentially.

But unlike the world of containers all these data are actually incredibly hard to use. The geospatial world is still stuck in the era of miscellaneous boxes, wooden crates, containers and sacks. So, to take a particular type of geospatial data from one system and use it in another system one has to go through a laborious process:

First gain an understanding of what, exactly, is contained in the source data and how the information is organized
Then go through a similar process of understanding how the information in the data in target system needs to be organized in order that the target system can make use of it
Finally develop a process to take the data from the source system and translate it in a format the target system can understand.

It’s all pretty horrible.

Over the last few decades there have been some valiant attempts to solve this problem. For example, in the world of street map data in the 1990s the United Stated Geological Survey (USGS) forged ahead with a standard called the Spatial Data Transfer Standard (SDTS). In 1994 it became a federal standard. Meanwhile in Europe there was a similar effort, but here it revolved around a standard called the Geographic Data File (GDF) which was aimed mainly at map data for car navigation systems. Around this time I remember having to attend some excruciatingly tedious and boring international standard meetings where the two sides (SDTS vs. GDF) battled it out, trying to come up with a single standard that everyone could agree on. Needless to say it went nowhere. Since then neither SDTS nor GDF have really taken hold.

You might be thinking well, what about a shapefile? Isn’t that a standard? Well, yes, it is. But while a shapefile dictates how the geospatial data are structured it says nothing about what’s in the data or how it’s organized. A shapefile is much more similar to a PDF document: a PDF allows you to exchange documents, but a PDF says almost nothing about what’s in the document — for example whether it’s a PhD dissertation or a picture of a cute kitten.

The lack of common, broadly adopted geospatial data exchange standards is crippling the geospatial industry. It’s a bit like going to an EV charger with your shiny new electric vehicle and discovering you can’t charge it because your car has a different connector to the one used by the EV charger. The electricity is there and ready to be sucked up and used, but, sorry — your vehicle can’t consume it unless you miraculously come up with a magical adaptor that allows the energy to flow.

Back in the geospatial world there have been a few exceptions to this dilemma that have proven to be quite successful. I’ll provide two examples:

The first relates to Google Maps and public transit schedules. Back in December 2005 the city of Portland, Oregon became the launch city for Google’s Transit Trip Planner which helped people plan trips using public transit schedules and maps. This app was later folded into Google Maps. Upon launching the feature in Portland Google obviously wanted to roll the feature out to as many cities as they could worldwide. However, they quickly discovered that every city published their public transit schedules and routes in different ways. Many of them just published the data in text form buried somewhere in a PDF document or in a web page. It was therefore going to take Google a humongous amount of manual effort to get the data transformed into a format they could ingest and use. To combat this Google came up with a data exchange specification called the ‘Google Transit Feed Specification’ or GTFS for short. Google pushed the specification hard and eventually it become broadly adopted, so much so that in 2009 the name of the specification was changed from ‘Google Transit Feed Specification’ to ‘General Transit Feed Specification’. GTFS is now open and its evolution is facilitated by an independent organization called MobilityData.

The success of GTFS should not be underestimated. It’s like a little standardized shipping container for data about public transit schedules and transit data and it’s used by thousands of public transport providers around the globe — I think even Disney uses it to publish schedule data about their theme park shuttles.

Another emerging success is in the world of indoor mapping, which as you may have guessed, is one of my favorite topics.

While working at Apple I was intimately involved with Apple Maps’ program to provide indoor maps of airports and shopping centers. This launched as a feature of Apple Maps in iOS 11 back in 2017. Like Google’s efforts to make public transit data available we quickly discovered that making indoor maps was an arduous task. In fact that problem was arguably an order of magnitude more complex than dealing with public transport data as indoor maps were far more intricate than a relatively simple transit schedule. The data for the indoor maps typically came in the format of computer aided design drawings or ‘CAD’ files. Even through these drawings were electronic the information contained within them was all organized in different ways. Sometimes walls were indicated by a single line. Sometimes they were indicated by double lines. Some organizations called an area inside a building a ‘room’. Others called it a ‘unit’ or ‘space’. The names for specific spaces were also different. For example, was it a ‘toilet’, a ‘restroom’, a ‘W.C.’ or a ‘women’s room’? All in all it was ugly and precipitated the need for a huge amount of manual effort to extract the needed information out of the source data and convert it into a format usable in Apple Maps.

In an effort to make things easier we hunted around for a data exchange standard that might meet our needs. We found a few, but nothing fit our particular needs. So like Google, we embarked upon creating our own specification, initially called ‘Apple Venue Format’ (AVF). This later evolved into something we called ‘Indoor Mapping Data Format’ (IMDF). We realized that to be successful we not only had to create a specification but we also needed to convince organizations to adopt it. So we worked with the industry ‘tool makers’ — those who developed software to create, edit and transform geospatial data — to support the specification. This included Esri of course, but also many other people in the business both large and small, for example Autodesk, MappedIn and Safe Software. With that in place we could start to push building owners and specific verticals industries to publish data using IMDF.

The efforts proved worthwhile and eventually we were extremely honored to have the Open Geospatial Consortium adopt the IMDF specification as a ‘Community Standard’. Like the ISO adopting McClean’s specification for shipping containers this gave IMDF the endorsement it needed to be broadly adopted. No longer did organizations, particularly public entities, have to be concerned about publishing data using a specification that was tied to a single large corporation. With the specification endorsed by the leading international geospatial standards body the fear of showing favoritism to a corporation was effectively removed. Today IMDF is becoming the preferred shipping container for exchanging indoor map data and it’s saving organizations a whole lot of time and energy.

So if GTFS and IMDF are two success stories where is the geospatial industry still lacking?

Well unfortunately there’s a ton of work left to do and it won’t be a piece of cake. It will likely be several pieces of cake — or even many cakes.

Let me provide a shortlist focused on just a tiny subset of geospatial data: those used in vehicle navigation applications. This includes:

Street addresses
Street names
Street centerlines
Street classifications
Street signage and traffic lights
Vehicle restrictions
Lane information
Temporary restrictions
Construction information

Talk to any city or regional government around the globe and they will probably have a geospatial database with at least some of this information contained within it. But while they may use the same tools to create and edit the data — typically Esri ArcGIS — you will find that none of them use the same specification for how they organize the data.

In the US alone there are about 600 cities with a population greater than 50,000. Looking at the entire globe this problem obviously multiplies to tens of thousands of cities.

So, imagine the scale of the problem if you’re a global mapping organization like Google Maps, Apple Maps, Waze, HERE or TomTom. You have the same ‘lack of a standard container’ issue that everybody in the shipping industry had back in 1955.

So what’s stopping progress?

Part of it is the natural inertia of standards bodies. While incredibly valuable, the standards developed by these bodies tends to happen at an incredibly slow pace. Just as importantly, the fact that a standard is developed is no guarantee that it will get used.

From my experience at least it’s much more efficient for a single organization to work on a specification and have that organization push adoption of the specification. Then, once some momentum is built, the organization works with the appropriate standards body to get that specification turned into a standard. This approach worked for shipping containers. It worked for GTFS. And it’s working for IMDF.

How could this be achieved?

Well we need an organization to take the lead in pushing standard data exchange formats. Preferably that organization would also be in a position to get various industries to adopt it.

Who could fulfill this role?

Well — here’s looking at you, Esri.

Esri could do it, but, frankly they’re not.

Yes, Esri has developed a number of standard data models for various industries but they are very few and far between. Furthermore they’re extremely hard to find on Esri’s website and when you do find references to them you’ll find they are thin and contain a number of broken links.³

The most important missing element: all of Esri’s data models are optional and suggestions at best. There seems to be little encouragement to use them and there is rarely a highly visible import/export option to a standard data model in the various renditions of ArcGIS.

But what about this ‘Living Atlas‘ that Esri has developed? Isn’t that a beautiful, aggregation of all the data anyone would ever need? Unfortunately, no, this is not the case. While Esri has put a humungous and admirable effort into collating, normalizing and aggregating data, the latency in involved with assembling it and keeping it maintained is too high for critical uses. Organizations still need access to the original source data.

But what about the great work of organizations like the US Department of Transportation on data standards like the Work Zone Data Exchange (WZDx)? The work of this group is extremely encouraging and hopefully they’ll make further progress. But it still needs to be broadly adopted, not only in the US but also around the globe.

Whether Esri develops the data models and publishes them is one question. We could all wait for standards like WZDx to become finalized. But Esri still has an extremely important role to play: encouraging, enabling and making it insanely easy for organizations to import and export data that adhere to these standard data models. This has to be part of the UI. It has to be part of the workflow. It has to be part of the standard data pipeline. It can’t just be an afterthought buried deep inside some data interoperability extension.

So, Esri, are you up to the challenge of becoming a Malcolm McClean, or will you forever be content with everyone having to live in a world of various sized boxes, wooden crates, containers and sacks?

______________________

¹To lean more about Malcolm McLean, watch this 3 minute video from the New York Times.

²It’s interesting to note that the ISO specification for shipping containers continues to use imperial units as its main driver. Today’s standard ISO containers usually measure 8 ft. 6 in high. ISO containers continue to be 8 ft. or 2,438mm wide. The most common lengths are 20 and 40 ft. Other lengths include 24, 28, 44, 45, 46, 53, and 56 ft. Because the standard is so widely adopted it’s doubtful if that the standard dimension would ever be changed to, say, 2,500mm wide — unless of course the French try to get involved.

³Try visiting Esri.com. There’s no section on their site for standard data models. If you search on their site for ‘data model’ you get a number of random pages for various industries, but nothing concrete and well organized. Another example, if you search on ‘address data model’ the first search result is a blog post titled “New Release of Local Government Information Model Supports Upcoming Address and Facilities Maps and Apps” — this sounds promising but then you discover the article was written 11 years ago and in it is a link that invites you to “download the information model from ArcGIS.com”. When you click on the link you are forced to sign in to ArcGIS.com are a presented with this lovely user experience:

As the 45th President of the United States was apt to say: “Sad!”

Why Geospatial Data is Stuck in the Year 1955

Share this: