The Microsoft Research team broke the world record for sorting
sortbenchmark.org annual competitions for sorting large data sets. One of the types of competitions — minute sort in which you need a minute to load from disk and sort as much as possible the number of records and save the result to a file. The competition is held in two categories — Indy, no restrictions on used iron, and Daytona should only be used by ordinary computers “store”.
The Microsoft Research team has managed greatly exceed keeping with a 2009 record of Yahoo in the category of Daytona. Their cluster of 1,033 disks in 250 machines, handled by the 1401 gigabytes of data. This is almost three times better result Yahoo (500 GB), despite the fact that the Yahoo cluster was almost six times more (5624 1406 disk on the machines). Moreover, maykrosoftovskih cluster broke last year's record in the category of Indy (1353 gigabytes).
Such impressive results were achieved through technology, Flat Datacenter Storage (FDS). Microsoft did not use the typical for such problems, solutions based on MapReduce paradigm. For some tasks, and sorting is one of them, it is impossible to process pieces of data independently from each other on different nodes, as is done in MapReduce solutions. From having to move huge amounts of data can not escape.
Technology FDS uses the fact that since that time, been established as the architecture of MapReduce, networks have become much faster and cheaper. This allowed us to build a cluster in which every computer can communicate with any other simultaneously at full speed of your network interface (the network is called full bisection bandwidth network). Thus, instead of Hadoop, which was used in 2009 Yahoo, the Microsoft Research team used a network file system that allows you to access any data on any node as if they reside on local disk.
Microsoft plans to use the architecture of the FDS in data centers serving the search engine Bing.
Article based on information from habrahabr.ru
The Microsoft Research team has managed greatly exceed keeping with a 2009 record of Yahoo in the category of Daytona. Their cluster of 1,033 disks in 250 machines, handled by the 1401 gigabytes of data. This is almost three times better result Yahoo (500 GB), despite the fact that the Yahoo cluster was almost six times more (5624 1406 disk on the machines). Moreover, maykrosoftovskih cluster broke last year's record in the category of Indy (1353 gigabytes).
Such impressive results were achieved through technology, Flat Datacenter Storage (FDS). Microsoft did not use the typical for such problems, solutions based on MapReduce paradigm. For some tasks, and sorting is one of them, it is impossible to process pieces of data independently from each other on different nodes, as is done in MapReduce solutions. From having to move huge amounts of data can not escape.
Technology FDS uses the fact that since that time, been established as the architecture of MapReduce, networks have become much faster and cheaper. This allowed us to build a cluster in which every computer can communicate with any other simultaneously at full speed of your network interface (the network is called full bisection bandwidth network). Thus, instead of Hadoop, which was used in 2009 Yahoo, the Microsoft Research team used a network file system that allows you to access any data on any node as if they reside on local disk.
Microsoft plans to use the architecture of the FDS in data centers serving the search engine Bing.
Комментарии
Отправить комментарий