Subscribe to our Newsletter

Real Time Spatial Analytics Using AWS Cloud Search

Analysts use tools to perform various types of spatial analysis such as:

  • Cheapest Home Insurance location within 50 miles of Dallas
  • What locations are most amenable given the income, population & other demographics of a place within a 25 mile radius of New York City?
  • What zip codes have the highest crimes rates within 25 miles of Chicago?

However, when we try and convert this analysis into a real life operational system that is high traffic with low latency and little room for error, most tools that perform well offline don’t live up to expectations.

Amazon Web Service’s, the cloud computing platform from Amazon is one of the leaders in providing cloud based hosting solutions. Over the years, they have been steadily adding several software services to take advantage of their hardware platform. One search software service is the “AWS Cloud Search” that is itself used to power Amazon’s high performance e-commerce search. Now, the same technology can be used by other customers for searching.

The below case study shows how AWS Cloud search can be used to perform geo-searching & spatial analytics to find the cheapest home insurance within a given area. The below heat map shows the varying home property values (and with it corresponding home insurance prices) across the nation.

 

The sample data underlying the above heat map is given in the below table.

The challenge from a real time operational geo-analytical spatial search perspective is to find the cheapest home insurance within a 200 miles radius of San Francisco.

To do this with AWS Cloud Search, we need to first set up the search domain within AWS Cloud search based on the following broad activities:

  • Search Domain Creation & Configuration
  • Data Upload to your search domain. The Indexed fields in our data will include, home insurance rates, home prices, zip code & lat long details.
  • Data search within AWS Cloud Search & Controlling of Search results

For calculating distances between places, we use Cosine search. Details on the math & logic behind the cosine search can be found here.

Once we have the indexed the document and you want to return all places within a 200 mile radius of San Francisco, the query is below.

dis_rank="&rank-dis=acos(sin(37.7833)*sin(3.141*lat/(1000000*180))%2Bcos(37.7833)*cos(3.141*lat/(1000000*180))*cos(-122.4167-(-3.141*(long-18100000)/(100000*180))))*6371*0.6214" ;threshold=”&t-dis=..200”

The query is showing the places for San Franciso Lat Long coordinates which is given below.

  • latitude =37.7833
  • longitude = -122.4167
  • radius = 200

When you pass this query to AWS Cloud Search, the speed of the response is on part with the best search engines in the world. The tuning & maintenance to operationalize such performance will take teams years to deliver.  So, when you do a cost & benefit analysis on operationalizing your real time spatial analytics, consider outsourcing key parts of your infrastructure to a search infrastructure that powers “Earth’s Largest Store”!

===

This is analysis was written by vHomeInsurance.com, a home insurance data analytics service.

DSC Resources

Email me when people comment –

You need to be a member of Data Plumbing to add comments!

Join Data Plumbing

Webinar Series

Follow Us

@DataScienceCtrl | RSS Feeds

Data Science Jobs

Principal Data Scientist - Mercedes-Benz

Mercedes-Benz Research & Development North America, Inc. - Mercedes-Benz Research & Development is a place for exceptional people with outstanding ideas and the absolute willingness to bring them to lif...

Sr. Computer Scientist - Adobe

Adobe - The challenge Be part of the foundational team that will be responsible on developing generation Platform which will power Adobe’s Experience Cloud...

Deep Learning Data Scientist - NVIDIA

nVidia - We are looking to fill a key role in our Advanced Analytics team, which is the analytic hub of NVIDIA's business data organization. Be at the ...