The pitfalls of using Excel Power Query to MySQL and MySQL to automate reporting

Hi all.
Come 2016, which means it's time to upgrade tools to simplify the boring mechanical work. Departments analysts, marketing, sales often face challenges when updating reporting:
1. The data we have to piece together from multiple sources.
2. The reports are prepared in Excel, which imposes significant restrictions on the amount of data being processed.
3. Changes to preconfigured by developers of discharge it's usually not the fastest.
If reports need to be updated weekly or even daily, then this procedure becomes very stressful even for the most patient. Using the Excel add-ins Power Query and write data in MySQL, you can make updates to most of the report before pressing the button "Update":
1. Data from any number of sources imported using SQL queries in the regular Excel spreadsheet.
2. Even from a large database can be recorded in Excel only a small portion of data (e.g., totals for a date range grouped according to the required columns).
3. Changes to the report can be made just by changing the SQL query. Next, form the right report standard Excel tools.
In this article I will show you how to set up and automatically fill in a simple MySQL database (for example, the discharge statistics of keywords of Yandex Metrics), and then one button to refresh reports in Excel using the Power Query add-in. Power Query has a very strange peculiarities of work in compiling SQL queries (especially dynamic), which we will examine in the second part of the article.
The choice of MySQL (or any other popular database) is free, relatively easy to work with fairly large databases without technical tricks. As an example, we will use Amazon Web Services: cheap (in most cases the instance will be free for you for 12 months).
Let's start (if you already have a database ready data, then you can go directly to the tab in Excel):
1. Register for AWS (if no accounts) launched the easiest t2 instance.micro and go to it via SSH. You can see the brief instructions in the previous post habrahabr.ru/post/265383. Please note that we will need first in the option list instance on the Amazon Linux AMI. You need to set rules to allow an appeal to the instance on the desired ports:

For security purposes, it is better to put restrictions on IP address. If you have a dynamic IP, then it is a troublesome option. Also, sometimes restricting access to MYSQL by IP causes an error in Excel. If set to any IP, then it works.
2. Execute consecutive commands described in the documentation docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-ug.pdf. We need a head "Tutorial: Installing a LAMP Web Server on Amazon Linux". Remember the password that you enter when you run the command "sudo mysql_secure_installation". For convenience, install phpMyAdmin as described at the end of this Chapter. If you copy-paste from the documentation the line "sudo sed -i-e 's/127.0.0.1/your_ip_address/g' /etc/ht tpd/conf.d/phpMyAdmin.conf", note that sometimes when you copy to "httpd" there is an extra space.
After these steps, your instance should open this page:

3. Go under the root user and the password that was entered during configuration. To access the database from the "outside" (i.e. from Excel), we need a user other than root. Start it in the phpMyAdmin interface in the Users menu --> Add user. Add user stats, set the password and assign it the privileges of SELECT and INSERT. Total will receive:

4. Now let's create the database data:
5. In this example, we fill the base statistics of visits for keywords from Yandex Metrics. For this, create table seo (note that the id column you have to select A_I (auto increment)):

6. For the statistics of keywords of Yandex Metrics you can use the following script. As parameters you need to specify start and end dates of discharge (variables $startDate and $endDate), authentication token (in the code there is a description how to get it), the counter number from which you want to obtain statistics, and the database parameters: ID of the instance, username (we have a "stats"), password, and database name (we have "data"). Copy to the root folder of the instance this code and run the command "php seo.php".
PHP Code to upload data Yandex Metrics
<?php
// start and end period of discharge
$startDate = '2015-10-01';
$endDate = '2015-12-31';
// the token for requests to API Yandex Metrics
// to obtain the token, create the app on https://oauth.yandex.ru/
// give the app permission to read statistics Yandex Metric
// after creating the option to take the app ID and substitute in the end https://oauth.yandex.ru/authorize?response_type=token&client_id=
// I press to Authorize and get the token
$atoken = ";
// number counter
$project = ";
// connect to database and check if everything is in order
$con = mysqli_connect("ec2-....compute.amazonaws.com","stats","your password here", "data");
if (mysqli_connect_errno()) {
echo "Failed to connect to MySQL:" . mysqli_connect_error();
exit();
} else {
echo "Connection successfull \n";
}
if ($con->query("SET NAMES 'utf8'")) {
echo "set Names \n";
} else {
printf("Error: %s\n", $con- > error);
exit();
}
// the file in which to write the results of discharge (in case of an error writing to the database it is possible to see which line was last recorded and, accordingly, gave the error)
$fname = "data.txt";
$fp = fopen($fname, "w");
fclose($fp);
function getBatch($startDate, $endDate, $project, $offset, $limit, $atoken, $fname, $con) {
// make request to API Yandex Metrics
// documentation https://tech.yandex.ru/metrika/doc/api2/api_v1/attributes/visitssource_/search-docpage/
$ch = curl_init();
$options = array(
CURLOPT_URL => 'https://api-metrika.yandex.ru/stat/v1/data?oauth_token='.$atoken.'&id='.$project.'&accuracy=full&date1='.$startDate.'&date2='.$endDate.'&dimensions=ym:s:<attribution>SearchPhrase&metrics=ym:s:visits&limit='.$limit.'&offset='.$offset,
CURLOPT_RETURNTRANSFER => TRUE
);
curl_setopt_array($ch, $options);
$data = curl_exec($ch);
curl_close($ch);
// translate the query response as a JSON object in the array
$data = json_decode($data, true);
// look what the answer was from Yandex Metrics
//var_dump($data);
// pass the array $data['data'] and record the period of discharge, a key phrase and number of visits
for ($i=0; $i < count($data['data']); $i++) {
file_put_contents($fname, $startDate."\t".$endDate."\t".$data['data'][$i]['dimensions'][0]['name']."\t".$data['data'][$i]['metrics'][0]."\n", FILE_APPEND);
if ($con->query("INSERT INTO seo (startDate, endDate, query, visits) VALUES ('".$startDate."', '".$endDate."', '".mysqli_real_escape_string($con, substr($data['data'][$i]['dimensions'][0]['name'], 0, 255))."', ".$data['data'][$i]['metrics'][0].")")) {
//echo "Record done \n";
} else {
printf("Error: %s\n", $con- > error);
exit();
}
}
if (count($data['data']) == 0) {
return 'done';
} else {
return 'more';
}
}
// https://tech.yandex.ru/metrika/doc/api2/api_v1/data-docpage/
// maximum number of rows in a single table - 10 000
// in a loop unloaded $limit rows, increasing first line of discharge $offset to the value of $limit
$offset = 1;
$limit = 10000;
$res = 'more';
do {
$res = getBatch($startDate, $endDate, $project, $offset, $limit, $atoken, $fname, $con);
$offset += $limit;
} while ($res == 'more');
mysqli_close($con);
If there are errors when connecting to the database, they will appear in the console and will be terminated. In case of successful execution get the keyword statistics for the selected period:

Excellent data were obtained. View how to get them to Excel.
Using Power Query to load data in Excel
Power Query is an add-in that extends Excel's data upload. You can download it here www.microsoft.com/en-us/download/details.aspx?id=39379. To work with MySQL, you may need the MySQL Connector and Visual Studio (available when installing from the distribution).
1. After installation select MySQL:

2. As a base we specify ID of our instance (as in the script) ec2-....compute.amazonaws.com. Database data. For login select Database:
3. In the opened window, double-click on the table seo and get:

In this window you can control the queries, changing columns and number of lines. When the database is small, then it works. However, if the data size exceeds 20MB, Excel on most computers will simply hang from such a request. Besides, it would be nice to change the date of your request or other parameters.
Dynamic queries in Power Query you can do using the built-in language M msdn.microsoft.com/en-us/library/mt253322.aspxbut the query is extremely unstable in terms of changing any parameters in them. To ensure that the request remains "constant" let's do the following trick:
1. First we make a table in which you specify the parameters you want. In our example, this is the date of discharge. Format cells with values is better to put as a test, because Excel likes to change the format of cells in its discretion:

2. Create query Power Query "From the table", which will simply duplicate this table:

3. In the options request must indicate the format of the second column as Text, otherwise the subsequent SQL query will be incorrect. Next, click "Close & load".

In total we received a query Power Query to a regular table, which will take the value start and end of discharge.
To make the SQL query will need to disable one option: go to the options and Settings -- > request Settings -- > Privacy and select the "Ignore the privacy levels for potential performance improvements". Click OK.

4. Now make a query to our database, indicating the beginning and end of the period table values from item 3. Reconnect to the database in Power Query and click "Advanced editor" in the menu.

For example, we want to get the sum of the visits that brought a keyword that contains "2015". In the M language request looks like this:
the
let
Source = MySQL.Database("ec2-....compute.amazonaws.com", "data", [Query="select sum(visits) from seo where startDate>='"&Text.From(Table1{0}[Value])&"' and endDate < ='"&Text.From(Table1{1}[Value])&"' and query like '%2015%';"])
in
Source
The startDate and endDate parameters specify the values in the table from paragraph 3. When prompted "To run this native query to the database required permissions" click "Edit permission", check that all the parameters are tightened correctly and execute the query. Now received a response from a SQL query be used with the standard formulas in Excel in the usual.
5. Important! When you update upload in the next time, then it has to be done in the following way (others give error)
— change the date in the table from paragraph 1
— go to Data menu --> Connections and click "Update all":

In this case, all the queries will be executed correctly and your reports will be updated automatically. Total for report upgrade you only need to change the query parameters and click "Update all".
Комментарии
Отправить комментарий