How to find the nearest cafes, attractions a free taxi through the eyes of a programmer

Services that solve any tasks in the context of our location quite firmly entrenched in our lives. Most smartphones can if you have access to the Internet to call us a taxi, calculate how long the bus will arrive, to route traffic and various user preferences or to show friends nearby. Tasks like finding the nearest café or attractions become trivial and can usually be solved without access to the world wide web. In this article I want to consider some tools for solving such problems and compare their performance among themselves.

problem Statement


You need to choose the tools to develop the service, in which users periodically upload your location, and other people looking for their "neighbors". Examples of services that solve such problems can be the services of a taxi, a social network and games like Ingress.

/ >

how we do


A theoretical introduction is in article, learn more — Wikipedia. Further, as will be discussed purely practical issues.
To solve the problem will be implemented class-adapters for multiple selected services. Data interface adapters presented in listing:

the
from abc import ABCMeta,, abstractmethod,


class AbstractStorage(object):
__metaclass__ = ABCMeta

@, abstractmethod,
def prepare_storage_for_experiment(self, test_data):
pass

@, abstractmethod,
def experiment_search(self, test_data):
pass

@, abstractmethod,
def experiment_update(self, test_data):
pass

@, abstractmethod,
def clear_storage(self):
pass


The time dimension is performed using Profilehooks.

approximations


    Here and further, all code written in Python; all of the following tools can be run from other common programming languages, unless specified otherwise. The ability to speed up your system by rewriting everything in a faster programming language like c/c++ in this article will not be considered, although it can be used in combat, if it is proved the expediency of such decision. the

  1. In the system, all queries are processed sequentially, which is equivalent to the existence of the request queue before the reporting module and the module in one thread; when using the system in combat the developed module is likely to handle requests in multiple threads.
  2. the
  3. All tests are run on a laptop with the following set of iron: 8 Gb RAM, Intel Core i5 2,6 GHz, SSD. Configuration of server iron are likely to be the other.
  4. the
  5. All the tools you will use with the default configuration except the same amount of allocated RAM(where this point is configurable with standard tools). Configuring the selected tools in this article will not be considered.

Implementation of


String(document or another — depending on the adopted terminology) consists of id and the coordinate pairs in an internal representation system. For each of the columns in the constructed index in the case that the system allows you to do this. All implementations will be submitted to the code responsible for search and update. Link to the full project code on github will be given at the end of the article.

Implementation 1. MongoDB v3.2.6
Link to documentation on how to geosearch

The code responsible for testing the search speed and updates:

the
@timecall(immediate=True)
def experiment_search(self, test_data):
def find_point(point):
cursor = self.collection.find(
{
MongoStorage.key_position:
{
'$near':
{
'$geometry':
{
'type': "Point",
'coordinates': [point['lng'], point['lat']]
},

}
}
}
)
return cursor[0] if cursor.count() > 0 else None

@timecall(immediate=True)
def experiment_update(self, test_data):

for t in test_data:
self.collection.update_one(
{
MongoStorage.key_id: t["id"]
},
{
'$set': {
MongoStorage.key_position: {
'type': "Point",
'coordinates': [t['position']['lng'], t['position']['lat']]
}
}
}
)

The Explain for the search query:

the
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "taxi_geo_experiment_test_db.taxi_driver_collection",
"indexFilterSet" : false,
"parsedQuery" : {
"key_position" : {
"$near" : {
"$geometry" : {
"type" : "Point",
"coordinates" : [
37.3816328351611,
55.01604115262
]
},
"$maxDistance" : 10000
}
}
},
"winningPlan" : {
"stage" : "GEO_NEAR_2DSPHERE",
"keyPattern" : {
"key_position" : "2dsphere"
},
"indexName" : "key_position_2dsphere"
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "host",
"port" : 27017,
"version" : "3.2.6",
"gitVersion" : "05552b562c7a0b3143a729aaa0838e558dc49b25"
},
"ok" : 1
}

See that is used geoindeks.

Implementation 2.1. PostgreSQL v9.5.2 using ST_DWithin
Link to documentation (postgis)

The code responsible for testing the search speed and updates:

the
@timecall(immediate=True)
def experiment_search(self, test_data):
cursor = self.db_connect.cursor()

for item in test_data:
request = "SELECT * FROM {table_name}" \
"WHERE ST_DWithin({column_geo},ST_MakePoint({lat},{lng}), 10000)".format(
table_name=PostgreSQLStorage.table_name,
column_geo=PostgreSQLStorage.column_position,
lat=item["lat"],
lng=item["lng"])
cursor.execute(request)
search_result = cursor.fetchall()


@timecall(immediate=True)
def experiment_update(self, test_data):
cursor = self.db_connect.cursor()

for item in test_data:
request = "UPDATE {table_name} set {update_column_name}=ST_MakePoint({lat},{lng})" \
"where {id_column_name}={id}".format(
table_name=PostgreSQLStorage.table_name,
update_column_name=PostgreSQLStorage.column_position,
id_column_name=PostgreSQLStorage.column_id,
lat=item["position"]["lat"],
lng=item["position"]["lng"],
id=item["id"]
)
cursor.execute(request)
self.db_connect.commit()

The Explain for the search query:

the
 Index Scan using geo_index on taxi_drivers (cost=0.14..10.51 rows=1 width=36)
Index Cond: (driver_position && '0101000020E6100000C8EA77DD72C44B404C0305B698B04240'::geography)
Filter: (('0101000020E6100000C8EA77DD72C44B404C0305B698B04240'::geography && _st_expand(driver_position, '10000'::double precision)) AND _st_dwithin(driver_position, '0101000020E6100000C8EA77DD72C44B404C0305B698B04240'::geography, '10000'::double precision, true))

Also see the use of geoindeks.

Implementation 2.2. PostgreSQL v9.5.2 using the ST_Distance
Link to documentation (postgis)

The code responsible for testing of speed of search:
the
@timecall(immediate=True)
def experiment_search(self, test_data):
cursor = self.db_connect.cursor()

for item in test_data:
request = "SELECT * FROM {table_name}" \
"WHERE ST_Distance({column_geo},ST_MakePoint({lat},{lng})) < 10000".format(
table_name=PostgreSQLStorage.table_name,
column_geo=PostgreSQLStorage.column_position,
lat=item["lat"],
lng=item["lng"])
cursor.execute(request)
search_result = cursor.fetchall()


The code responsible for testing the refresh rate, does not differ from the implementation described in the previous paragraph.
The Explain for the search query:
the
 Seq Scan on taxi_drivers (cost=0.00..8241.00 rows=10000 width=60)
Filter: (_st_distance(driver_position, '0101000020E61000003B2CDE5519AA4B402B1697185FED4240'::geography, '0'::double precision, true) < '10000'::double precision)

In this case, the index is not used, you will view all the values in the table, which is significantly slower.
More about EXPLAIN in PostgreSQL.

Implementation 3. Redis v3.2.0
Link to documentation on how to heofonum

The code responsible for testing the search speed and updates:

the
@timecall(immediate=True)
def experiment_search(self, test_data):
i = 0
for item in test_data:
command = "GEORADIUS {key} {lng} {lat} {dist_km} km WITHCOORD WITHDIST".format(
key=RedisStorage.KEY_DRIVERS,
lat=item["lat"],
lng=item["lng"],
dist_km=10
)
result = self._redis.execute_command(command)

@timecall(immediate=True)
def experiment_update(self, test_data):
for item in test_data:
command_rm = "ZREM {key} \"{id}\"".format(
key=RedisStorage.KEY_DRIVERS,
id=item["id"]
)
self._redis.execute_command(command_rm)
command_add = "GEOADD {key} {lng} {lat} \"{id}\"".format(
key=RedisStorage.KEY_DRIVERS,
lat=item["position"]["lat"],
lng=item["position"]["lng"],
id=item["id"]
)
self._redis.execute_command(command_add)

Explain analog for redis is not, as such the team is not necessary, redis is not designed for complex queries, in which such functionality may be required.
It is easy to see another feature in redis to be removed from the key(the closest equivalent key in the SQL table; the key can contain a simple value, for example, the number and complex — for example, many) one of geoobjects, it is necessary to use a knowledge of internal representation: ZREM removes the value from the set.

Output


Testing was conducted on 30 000 objects and the same number of queries. If necessary, you can test on other sets of values, replacing the appropriate parameters in the code. The test results presented in the table:
the the the the the
Tool Time searching Time updated
MongoDB 320.415 14.275
PostgreSQL(ST_DWithin) 114.878 8.908
PostgreSQL(ST_Distance) 1829.920
(implementation and result of the same PostgreSQL(ST_DWithin))
Redis 1098.604 5.016

All the data in the table are in seconds, the value of the average of 5 tests.

Link in the project repository.

If you know another tool for the effective solution of the assigned tasks — write in comments, would consider with interest.

Update 1: Added implementation of PostgreSQL using ST_Distance. This implementation does not use an index, respectively, the search runs longer.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Address FIAS in the PostgreSQL environment. Part 4. EPILOGUE

PostgreSQL: Analytics for DBA

Audit Active Directory tools with Powershell releases. Part 1