Design in PostgreSQL, document-oriented API: Finding the search (Part 3)
In parts one and two of this small series of articles, I showed different ways saving document and subsequent updates its search fields. In addition, I showed how transactional save a lot of documents. In this article I will consider options for executing queries.
In the first part we created a table that looks like the following:
the
Since we have control over how the data is stored, we can write your own functions to retrieve these data in many exciting ways! All the hard work is done (save, update, etc.) — now let's have some fun.
Each document has a field id associated with it completely, thanks to the save_document. It's still Postgres, so each field requires a primary key and we put it inside the document itself. I made my integer ω, but you can also use bigint the snowflake Twitter, if you wish. At the moment we will use the type serial.
The function for this is pretty straightforward:
the
This is the simplest of all possible function — it takes the table name and ID, which you must find and perform the fastest of all the query (which we love!): find by primary key. Speed: we like.
Now let's create one for the bulk of the request. For this I want to enter a criterion and to teach ago the first match. This will only work properly if I sort the result, so I add ORDER BY as default value for it was referred to by ID:
the
Moreover, we can expect strange behavior depending on which driver we use. The first thing I would like to note, I'm overloading the find_document because Postgres allows me to do this. Which in turn means that the only difference between our original function that searches by id, and this function is a list of arguments.
For Npgsql driver is not a problem. For node_pg driver is even some problem. Since I set the default value for the parameter orderby, some confusion arises when choosing which function to run. As far as I can tell, node_pg driver cares not a type of function arguments, only number. Thus, if we try to run the function "search by id" mentioned above, our second function will fire.
Re: Npgsql (.NET driver) there is no such problem. So if you have problems, just rename one of the functions, or remove the default value for the parameter.
Another thing I would like to note, I set the criteria varchar/ It was done, though technically not correct, but it makes the API a little better. If I asked it as jsonb, the query would have to execute the following:
the
The difference is small, because we will use mainly the API from the code (which will be discussed in the next post).
Let's repeat the same thing, only a few of the returned documents:
the
This is interesting. Get me the result is setof jsonb, this means that I have to return some number of rows jsonb. It is not clear how to do it using PLV8, and may be a better way than mine — but that I know for sure that it works.
As soon as I get the result (the line of our document-oriented table), you need to run a loop that will take and insert the body of the jsonb fields into an array, which I then return.
It works because the body field is jsonb, which, in turn, is text. This is not a javascript object, because if he was, then I would get the error (old [Object object] parkingowa stupidity).
Many of you will notice that the orderby here concatenated directly inside. If you allow your users to write SQL in your database, then Yes, it is a problem. But, hopefully, you will carry out this function from the driver that parametricity your queries for You about this:
the
... will not work. Why not? Because ideally you do something like this:
the
If not, you get what you deserve :).
Let's finish full-text search for our documents, as it should. This is my favorite part:
the
It's all pretty simple if you know how to work the indexing for full text search in Postgres. Here we are just working with search field (which is indexed with a GIN index to speed), which we update every time you save. This query is fast as lightning and is very easy to use.
Two functions that take the criteria (search and filtering), I use the operator of the "content". It's a small symbol of the @>.
This operator is specific to jsonb and allows us to use a GIN index on the field body. This index looks like the following:
the
A special piquancy here gives jsonb_path_ops. He tells the indexer to optimize operations of the "content" jsonb (actually: does this piece of jsonb in another piece jsonb). This means that the index is faster and smaller.
Now, I could cite a bunch of sources and articles about how PostgreSQL passes MongoDB and others, when it comes to writing/reading. But this is misleading.
If You take a single PostgreSQL server against a single MongoDB server, the MongoDB will look a lot sillier and Postgres will smoke almost every metric. This is due to the fact that Postgres was developed — the "scalable" database.
If You are optimizing MongoDB and add servers to distribute the load, indexes will be closer to each other, but, in addition, you will have to deal with a horizontal system that is may behave as you don't expect. This is all very controversial, of course, but the following should be noted:
theIndexing in Postgres slows down. So what if it destroys Your performance record, You should set the index only on what you want to index (by specifying (body - > my_field)) when setting a GIN index
the If you access something frequently (e.g. email address), just copy it in a separate column and make it UNIQUE You can handle synchronization in your code or in a separate function.
In the next article I will dive into the ways to cause this material from the code!
Article based on information from habrahabr.ru
the Best way to search for documents
In the first part we created a table that looks like the following:
the
create table my_docs(
id serial primary key,
body jsonb not null,
search tsvector,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now()
)
Since we have control over how the data is stored, we can write your own functions to retrieve these data in many exciting ways! All the hard work is done (save, update, etc.) — now let's have some fun.
Recoverable document by ID
Each document has a field id associated with it completely, thanks to the save_document. It's still Postgres, so each field requires a primary key and we put it inside the document itself. I made my integer ω, but you can also use bigint the snowflake Twitter, if you wish. At the moment we will use the type serial.
The function for this is pretty straightforward:
the
create function find_document(tbl varchar, id int, out jsonb)
as $$
//find by the id of the row
var result = plv8.execute("select * from" + tbl + " where id=$1;",id);
return result[0] ? result[0].body : null;
$$ language plv8;
select * from find_document('customers',20);
This is the simplest of all possible function — it takes the table name and ID, which you must find and perform the fastest of all the query (which we love!): find by primary key. Speed: we like.
Now let's create one for the bulk of the request. For this I want to enter a criterion and to teach ago the first match. This will only work properly if I sort the result, so I add ORDER BY as default value for it was referred to by ID:
the
create function find_document(
tbl varchar,
criteria varchar,
orderby varchar default 'id'
)
returns jsonb
as $$
var len = JSON.parse(criteria); //this will throw if it is invalid
var results = plv8.execute("select body from" +
tbl +
"where body @> $1 order by body ->> '" +
orderby + "' limit 1;",criteria);
return results[0] ? results[0].body : null
$$ language plv8;
select * from find_document('customers','{"last": "Conery"}', 'first');
Moreover, we can expect strange behavior depending on which driver we use. The first thing I would like to note, I'm overloading the find_document because Postgres allows me to do this. Which in turn means that the only difference between our original function that searches by id, and this function is a list of arguments.
For Npgsql driver is not a problem. For node_pg driver is even some problem. Since I set the default value for the parameter orderby, some confusion arises when choosing which function to run. As far as I can tell, node_pg driver cares not a type of function arguments, only number. Thus, if we try to run the function "search by id" mentioned above, our second function will fire.
Re: Npgsql (.NET driver) there is no such problem. So if you have problems, just rename one of the functions, or remove the default value for the parameter.
Another thing I would like to note, I set the criteria varchar/ It was done, though technically not correct, but it makes the API a little better. If I asked it as jsonb, the query would have to execute the following:
the
select * from find_document('customers','{"last": "Conery"}'::jsonb, 'first');
The difference is small, because we will use mainly the API from the code (which will be discussed in the next post).
Filtering
Let's repeat the same thing, only a few of the returned documents:
the
create function filter_documents(
tbl varchar,
criteria varchar,
orderby varchar default 'id'
)
returns setof jsonb
as $$
var len = JSON.parse(criteria);//this will throw if it is invalid
var results = plv8.execute("select body from" +
tbl +
"where body @> $1 order by body ->> '" +
orderby +
"'",criteria);
var out = [];
for(var i = 0;i < results.length; i++){
out.push(results[i].body);
}
return out;
$$ language plv8;
select * from find_document('customer_docs','{"last": "Conery"}');
This is interesting. Get me the result is setof jsonb, this means that I have to return some number of rows jsonb. It is not clear how to do it using PLV8, and may be a better way than mine — but that I know for sure that it works.
As soon as I get the result (the line of our document-oriented table), you need to run a loop that will take and insert the body of the jsonb fields into an array, which I then return.
It works because the body field is jsonb, which, in turn, is text. This is not a javascript object, because if he was, then I would get the error (old [Object object] parkingowa stupidity).
SQL injection
Many of you will notice that the orderby here concatenated directly inside. If you allow your users to write SQL in your database, then Yes, it is a problem. But, hopefully, you will carry out this function from the driver that parametricity your queries for You about this:
the
db.filter("customers", {
last : "Conery",
orderBy : "a';DROP TABLE test; SELECT * FROM users WHERE 't' = 't"
}, function(err, res){
console.log(err);
console.log(res);
});
... will not work. Why not? Because ideally you do something like this:
the
select * from filter_documents($1, $2, $3);
If not, you get what you deserve :).
full text search
Let's finish full-text search for our documents, as it should. This is my favorite part:
the
create function search_documents(tbl varchar, in query varchar)
returns setof jsonb
as $$
var sql = "select body, ts_rank_cd(search,to_tsquery($1)) as rank from" +
tbl +
"where search @@ to_tsquery($1)" +
"order by rank desc;"
var results = plv8.execute(sql,query);
var out = [];
for(var i = 0; i < results.length; i++){
out.push(results[i].body);
}
return out;
$$ language plv8;
select * from search_documents('customers', 'jolene');
It's all pretty simple if you know how to work the indexing for full text search in Postgres. Here we are just working with search field (which is indexed with a GIN index to speed), which we update every time you save. This query is fast as lightning and is very easy to use.
Making the index more flexible
Two functions that take the criteria (search and filtering), I use the operator of the "content". It's a small symbol of the @>.
This operator is specific to jsonb and allows us to use a GIN index on the field body. This index looks like the following:
the
create index idx_customers on customers using GIN(body jsonb_path_ops);
A special piquancy here gives jsonb_path_ops. He tells the indexer to optimize operations of the "content" jsonb (actually: does this piece of jsonb in another piece jsonb). This means that the index is faster and smaller.
Now, I could cite a bunch of sources and articles about how PostgreSQL passes MongoDB and others, when it comes to writing/reading. But this is misleading.
reading Speed and
If You take a single PostgreSQL server against a single MongoDB server, the MongoDB will look a lot sillier and Postgres will smoke almost every metric. This is due to the fact that Postgres was developed — the "scalable" database.
If You are optimizing MongoDB and add servers to distribute the load, indexes will be closer to each other, but, in addition, you will have to deal with a horizontal system that is may behave as you don't expect. This is all very controversial, of course, but the following should be noted:
the
In the next article I will dive into the ways to cause this material from the code!
Комментарии
Отправить комментарий