bookradar.org — search service books

Want to introduce to you my project on which I worked the last few months, and tell us how it happened. bookradar.org is a search engine of books on the Internet shopping. The service is designed for people who love to read books. Using the site they can know where you can buy the book you want and also save on the purchase. The more expensive the book, the more its price varies in stores.



So for example, the book Phil Rosenzweig "the halo Effect...and eight other illusions, deceptive managers astray." is in different stores from 537 to 885 RUB. the Difference is quite significant.

From idea to result on ...



An example of a query on the book:



the Idea



I love to read books and often buy them. honestly, I buy them more often than time to read. Some books can lay on my shelf for 3 years before I get it. I have a ebook reader Kindle Paperwhite and Nook Simple Touch also lot of books I read on the computer any pdf. No doubt have e-books have advantages — they can quickly buy and read in the dark, but even so I prefer paper.

It was the middle of November 2012, I signed up for the online course “MongoDB for Developers”. It's been a couple of weeks from its beginning. Although I had no idea of all the possibilities of MongoDB, I have loved this technology. It was the desire to apply new knowledge in practice. And then I got the idea to make the books.

At that time I already saw one similar website, but to be honest, he was not very comfortable and not always given reliable results. Maybe should look for other sites, but at the time I didn't. Look at other sites I've gone after has launched its. They turned out to be much more than I could have imagined. Such sites more than a dozen. However, I did not bother. I can offer users a more convenient search, and in the future I hope more accurate and broader. And indeed a lack of ideas is not observed)

Implementation of


First, I decided that I will have a couple days off to carry out his plan, but as often happens, it took much more time. At work I use Django, but to be honest, for a couple of years of working with her she told me rather bored. Django is a great framework, but just wanted something new, and I decided to do a project in Flask. Why Flask? Some random guy sent me a link to tutorial create a blog in Flask+mongoDB and said he has been using Flask in their projects. It was interesting to try.



Asked my wife to paint a design (hi Pauline!), noting that the design needs the most simple, without any shadows and gradients. At that time it allowed me to save time on layout and easier to make changes.

It's been a month... honestly I was very tired all evening and weekend devoting programming and enthusiasm is rapidly quenched. I urgently needed to get feedback from real people. I put the project in the minimum working. The project has worked really searched for books in stores, but suffered from many small bugs and flaws. All this was done deliberately to speed up the computation of the first version. There was not even such basic things as error handling 404 and 500, not to mention all the history API.

After receiving positive feedback from colleagues, I was inspired to continue the work. The next month was mainly devoted to the finalization, in most of these small stocks.
Moreover, it became clear that the real data does not correspond to what I was originally. I had to change the schema of the documents in the database, to divide the collection, change the algorithms.

With real data a lot of jokes. For example, I hoped that ISBN may be a unique identifier for the book. In fact, it was not so. One book can be many ISBN's. I do not know who in stores is populating, but not enough that ISBN may be invalid, duck instead of the ISBN may be all that the devils, from random numbers, some phrases in your language. Instead of the number zero can be crowded symbol “O” instead of English “Ex” can be the Russian “Huh.”

And then I ran into a performance issue. Python like any other dynamic language isn't as fast as we would like. For web applications it does not matter, because if you have a slow website then it slows down database operations or network. I've profiled the code, optimized algorithms. Came to the conclusion that the algorithms of normal, slow Python and the database.

Had to rewrite a significant part of the project on a faster language, obviously statically typed. So the project has Scala, hitherto unknown to me a programming language)

Why Scala? I had to choose between C, C++ and Scala. The first Segmentation Fault forced me to erase from this list si. C a good language, but clearly not optimal for this task. Of course I watched the performance tests languages, but to be honest I didn't believe that the speed of Java/Scala close to the speed of C++. So I wrote my benchmark. I took a small piece of the parser and wrote the implementation of it in Python, Scala and C++.

Here are the results of Parsha 1.5 GB file:

CPython 4 min 12 sec
PyPy 2 min 48 sec
Scala 57 sec
C++ 47 sec.

Algorithm is everywhere the same, parsing only uses string operations. Scala also tried using a standard XML parser, but it worked much slower.
As you can see the speed of Scala really close to the pros. And to write and debug Scala easier. Besides, I have someone to consult in case of any neponyatok (hi Ivan!).

When writing code in Scala often caught myself thinking “so this code looks wrong, you need to figure out how to do it right.” Such perfectionism could significantly slow down the development. I said to myself “dude, you don't know this language, and therefore unable to write properly, so just write to work!”. It was psychologically difficult to force myself to write “to work”, I wanted to do “beautiful”. But in the end I took myself in hand and wrote “to work”.

TDD. All parsers from the beginning was covered by tests, Python, Scala. This is the place where the tests immediately accelerate the development. On the other hand on forntend still no test.

Monetization


On the first day after the launch of colleagues have asked me when I'm going to do a search pay. I'm not going to do that. Monetization is simple and clear — affiliate program shops. The place is not planning to.

Present


Already fixed a lot of bugs present at launch, although they are still present in a noticeable amount. One of the interesting problems which had to face this gluing of books from different sources. Now the gluing algorithm works quite well, but inogoda still faltering. If you are faced with this, email me.

Frontend written in Python/Flask backend in Scala, as database — MongoDB.

future Plans


Ideas are many, but in the near future I plan to work on search quality and fixed minor bugs. New features is fun, and they will appear, but later. By the way, your comments can affect the order in which they appear.

I hope you liked my service,
I would be very grateful for advice, criticism and suggestions.

Go — www.bookradar.org!
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Car navigation in detail

Google has launched an online training course advanced search

PostgreSQL: Analytics for DBA