As we have created a quick flow of events in the FPGA
the Device is called CEPappliance. CEP — from Complex Event Processing and appliance — (and so it should be clear, but just in case) “device” with English.
We started it in 2010 as a hobby, working on it after the main work in the long evenings, blending smoothly into the short nights and on weekends. In 5 years of doing this we have created 3 prototype in search of a solution with low latency and a simple programming model the logic of data processing.
In 2015, we realized that we did a decent creation that allows you to handle data streams with guaranteed latency of 2-3 microseconds. And we began to look for opportunities to turn a launched in a commercial product and will probably stop working for the man, to deal only with our product, devoting to it all the time. In late 2015 we found the first customer, he left “uncle” and went into “free floating”.
Today we can definitely say that the device we've got. We have not yet implemented all his plans and we still have a lot of work to add new features, sometimes to fix bugs. But our device worked for a year in commercial operation.
Working for “uncle” we've studied the technical aspects and requirements of trading in financial instruments on exchanges and focused primarily on them. This automated trading (HFT, Algo Trading), risk control (Pre-trade), direct access trading (Direct Market Access), etc.
But we managed to do CEPappliance quite a versatile device applicable in the areas where you need to pump a lot of data and do it not only quickly, but with guaranteed low latency. With native support for standard network protocols and minimum delays, the device is applicable in telecommunications for detection of security violations in networks, management download networks. The device can be used in telematics when you need for a few microseconds to make a decision and to react to the receipt of signals from sensors. However, the data processing device may be complex. To describe (programming) we use some of the techniques of Complex Event Processing technology (CEP).
CEPappliance was conceived and created for solving problems, which in simplified form can be formulated as follows: with a total latency less than 3 microseconds
CEPappliance different from the software solutions that run on architectures with Central processor that the core architecture of the device is a field programmable gate array (PPVM, FPGA) that implements all steps of the solution of the described task.
Architecture with a Central processor developing. Be hybrid variants (see Fig. 1, Fig. 2 and Fig. 3), in which the delivery time data from network interfaces to CPU (and back) is reduced due to the transport Protocol processing, network and application layers from the CPU to the network card. However, the delivery time of data is 1 — 3 microseconds (one way) and make a significant contribution to the delay, which postpones the reactions from the receipt of the signal1.
On FPGA we placed components for parsing, extracting, analyzing input data and format output data for single crystal figuratively speaking "without intermediaries" (see Fig. 4), which are always present in the decisions of the Central processor.

Fig. 1. The logical scheme of traditional solutions.

Fig. 2. The logical scheme of the hybrid solution with the CPU and the TCP Offload Engine on the network card

Fig. 3.The logical scheme of the hybrid solution with a Central processor, TCP Offload Engine and implementation of application-level protocols in the network map

Fig. 4. Logic CEPappliance
In CEPappliance components for parsing, extracting, analyzing input data and output data located on-chip FPGA and directly interact with each other.
For this it was necessary to “reinvent the wheel” again. Let me remind you that we started work (back in 2010) over CEPappliance mode hobby. Did it “as necessary and right.” In the end, among other things, we implemented Ethernet, TCP/IP, UDP, FIX, FAST and TWIME “from scratch”.
We managed to create these components so that parsing of the input data at the speed of their receipt (at wire speed). Components implement appropriate standards that are “set in stone” and do not change frequently. For standard protocols, we have provided the adjusting mechanism. For example, modules, protocols, FIX, FAST, TWIME etc. are configured using the parameters specified by the user, and templates or schemas that describe the structure of messages.
At the same time we proceeded from the fact that the (custom) data processing algorithms may change. For example, trading strategies or test performed by the broker to minimize the risks (pre-trade risk checks), follow the changing market situation, the modernization of the microarchitecture of the exchange or the regulators ' requirements.
Development of algorithms for FPGA directly on the “zhelezyachnye” languages (VHDL, Verilog, etc.) requires much more time for coding, debugging and testing than developing high-level languages [2]. It also requires special skills that the programmers who write programs in high level languages typically do not possess. And if you plan to use FPGA to accelerate the performance of their algorithms, then you will need to send a detailed description of the algorithm FPGA developer, who will implement it. Sometimes it is highly undesirable, as transfer of the description of the algorithm generates for its owner the risk of loss of competitive advantage.
Our device provides the user the ability to describe the information processing algorithm. For this purpose we have developed
the
Own programming language, processor, and compiler allow us to implement on FPGA (hardware) functions available to the user. These functions can be part of the algorithm or the whole algorithm is entirely depends on the feasibility of such implementation, wishes and possibilities of the user. This approach may significantly speed up the execution of programs in CEPappliance in some cases.
Allowing the user to individually program CEPappliance we obviously had to provide tools to debug these programs. Without such tools, it will be difficult to take full advantage of CEPappliance. Therefore, we developed a device emulator, which is 100% compatible with the device itself. Debug the program on the emulator, it is possible to change the configuration (in most cases, this change of IP address) and immediately run the program on the device.
In addition to debugging tools, device emulator allows to estimate delays program execution by the device itself. Using the thus obtained delay measurements it is possible to optimize the program.
And for automated testing of user programs written for CEPappliance, we have a special tool — a Test Bench that reads test scripts in a table and executes them. The same set of tests can be performed with the device and the emulator.
Well, summing up some results... Our boards are installed in the data center of the Moscow exchange and successfully traded. About the results of the auction can not tell — is not our topic, but the client is very pleased (and the text coordinated with him).
A lot of work ahead on development of the device, finding customers in areas outside of trading and a lot of new ideas!
1About that, how is this delay in the case of data exchange via TCP/IP can be found in [1]. And here describes how this delay can be reduced by implementing a hybrid architecture with the use of the FPGA.
1. S. Larsen and P. Sarangam, “Architectural Breakdown of End-to-End Latency in a TCP/IP Network,” International Journal of Parallel Programming, Springer, 2009.
2. David F. Bacon, Rodric Rabbah, and Sunil Shukla. FPGA Programming for the Masses. ACM Queue, Vol 11(2), February 2013.
Article based on information from habrahabr.ru
We started it in 2010 as a hobby, working on it after the main work in the long evenings, blending smoothly into the short nights and on weekends. In 5 years of doing this we have created 3 prototype in search of a solution with low latency and a simple programming model the logic of data processing.
In 2015, we realized that we did a decent creation that allows you to handle data streams with guaranteed latency of 2-3 microseconds. And we began to look for opportunities to turn a launched in a commercial product and will probably stop working for the man, to deal only with our product, devoting to it all the time. In late 2015 we found the first customer, he left “uncle” and went into “free floating”.
Today we can definitely say that the device we've got. We have not yet implemented all his plans and we still have a lot of work to add new features, sometimes to fix bugs. But our device worked for a year in commercial operation.
Working for “uncle” we've studied the technical aspects and requirements of trading in financial instruments on exchanges and focused primarily on them. This automated trading (HFT, Algo Trading), risk control (Pre-trade), direct access trading (Direct Market Access), etc.
But we managed to do CEPappliance quite a versatile device applicable in the areas where you need to pump a lot of data and do it not only quickly, but with guaranteed low latency. With native support for standard network protocols and minimum delays, the device is applicable in telecommunications for detection of security violations in networks, management download networks. The device can be used in telematics when you need for a few microseconds to make a decision and to react to the receipt of signals from sensors. However, the data processing device may be complex. To describe (programming) we use some of the techniques of Complex Event Processing technology (CEP).
CEPappliance was conceived and created for solving problems, which in simplified form can be formulated as follows: with a total latency less than 3 microseconds
-
the
- to obtain the input data (signal) at the network interface in the format of Ethernet protocols, TCP/IP, UDP, FIX, FAST, TWIME (FIX SBE), etc.; the
- to parse and retrieve user data; the
- to perform user data; the
- to generate output data (response) and send them on the network interface.
CEPappliance different from the software solutions that run on architectures with Central processor that the core architecture of the device is a field programmable gate array (PPVM, FPGA) that implements all steps of the solution of the described task.
Architecture with a Central processor developing. Be hybrid variants (see Fig. 1, Fig. 2 and Fig. 3), in which the delivery time data from network interfaces to CPU (and back) is reduced due to the transport Protocol processing, network and application layers from the CPU to the network card. However, the delivery time of data is 1 — 3 microseconds (one way) and make a significant contribution to the delay, which postpones the reactions from the receipt of the signal1.
On FPGA we placed components for parsing, extracting, analyzing input data and format output data for single crystal figuratively speaking "without intermediaries" (see Fig. 4), which are always present in the decisions of the Central processor.

Fig. 1. The logical scheme of traditional solutions.

Fig. 2. The logical scheme of the hybrid solution with the CPU and the TCP Offload Engine on the network card

Fig. 3.The logical scheme of the hybrid solution with a Central processor, TCP Offload Engine and implementation of application-level protocols in the network map

Fig. 4. Logic CEPappliance
In CEPappliance components for parsing, extracting, analyzing input data and output data located on-chip FPGA and directly interact with each other.
For this it was necessary to “reinvent the wheel” again. Let me remind you that we started work (back in 2010) over CEPappliance mode hobby. Did it “as necessary and right.” In the end, among other things, we implemented Ethernet, TCP/IP, UDP, FIX, FAST and TWIME “from scratch”.
We managed to create these components so that parsing of the input data at the speed of their receipt (at wire speed). Components implement appropriate standards that are “set in stone” and do not change frequently. For standard protocols, we have provided the adjusting mechanism. For example, modules, protocols, FIX, FAST, TWIME etc. are configured using the parameters specified by the user, and templates or schemas that describe the structure of messages.
At the same time we proceeded from the fact that the (custom) data processing algorithms may change. For example, trading strategies or test performed by the broker to minimize the risks (pre-trade risk checks), follow the changing market situation, the modernization of the microarchitecture of the exchange or the regulators ' requirements.
Development of algorithms for FPGA directly on the “zhelezyachnye” languages (VHDL, Verilog, etc.) requires much more time for coding, debugging and testing than developing high-level languages [2]. It also requires special skills that the programmers who write programs in high level languages typically do not possess. And if you plan to use FPGA to accelerate the performance of their algorithms, then you will need to send a detailed description of the algorithm FPGA developer, who will implement it. Sometimes it is highly undesirable, as transfer of the description of the algorithm generates for its owner the risk of loss of competitive advantage.
Our device provides the user the ability to describe the information processing algorithm. For this purpose we have developed
the
-
the
- algorithmic high-level language the
- processor original architecture and the
- optimizing compiler, a program with high-level language into codes for the CPU, and which can automatically parallelize programs for multiple processors running simultaneously.
Own programming language, processor, and compiler allow us to implement on FPGA (hardware) functions available to the user. These functions can be part of the algorithm or the whole algorithm is entirely depends on the feasibility of such implementation, wishes and possibilities of the user. This approach may significantly speed up the execution of programs in CEPappliance in some cases.
Allowing the user to individually program CEPappliance we obviously had to provide tools to debug these programs. Without such tools, it will be difficult to take full advantage of CEPappliance. Therefore, we developed a device emulator, which is 100% compatible with the device itself. Debug the program on the emulator, it is possible to change the configuration (in most cases, this change of IP address) and immediately run the program on the device.
In addition to debugging tools, device emulator allows to estimate delays program execution by the device itself. Using the thus obtained delay measurements it is possible to optimize the program.
And for automated testing of user programs written for CEPappliance, we have a special tool — a Test Bench that reads test scripts in a table and executes them. The same set of tests can be performed with the device and the emulator.
Well, summing up some results... Our boards are installed in the data center of the Moscow exchange and successfully traded. About the results of the auction can not tell — is not our topic, but the client is very pleased (and the text coordinated with him).
A lot of work ahead on development of the device, finding customers in areas outside of trading and a lot of new ideas!
1About that, how is this delay in the case of data exchange via TCP/IP can be found in [1]. And here describes how this delay can be reduced by implementing a hybrid architecture with the use of the FPGA.
Links
1. S. Larsen and P. Sarangam, “Architectural Breakdown of End-to-End Latency in a TCP/IP Network,” International Journal of Parallel Programming, Springer, 2009.
2. David F. Bacon, Rodric Rabbah, and Sunil Shukla. FPGA Programming for the Masses. ACM Queue, Vol 11(2), February 2013.
Комментарии
Отправить комментарий