Control standard player Sailfish OS with voice commands

Many people know and use such capabilities of the Android operating system like Google Now and Google Assistant, which allow not only time to obtain useful information and something to search the Internet, but also control the device using voice commands. Unfortunately, Sailfish OS (operating system, developed by the Finnish company Jolla and the Russian company Open mobile platform) does not provide such feature out of the box. As a result, it was decided to make up for the lack of these amenities on their own. One of the functions of the developed solution is the ability to control music player using voice commands, the technical side which will be discussed in this article.

For the implementation of the recognition and execution of voice commands required to go through four simple steps:

to develop a system of commands

implement speech recognition the

to implement the identification and execution of commands
to add backward voice communication.

It is assumed that, for a better understanding of the material the reader already has a basic knowledge of C++, JavaScript, Qt, QML and Linux and got acquainted with an example of their interaction within the Sailfish OS. Can also be useful prior familiarity with a lecture on related subjects conducted in the framework of the Summer school Sailfish OS in Innopolis in the summer of 2016, and other articles about developing for the platform, which was already published habré.

the

Development systems

Let us examine a simple example, limited to five functions:

the

the launch of a new music the

resume the music playback
pausing music
skip to the next song,
go to the previous song.

To start a new replay you want to check the open instance of the player (if you want to create) and start playing music in random order. Activation will use the command "play some music".

In the case of resumption and suspension of the playback you want to check the status of the player and subject to availability, start playback or to pause it. To resume playback will use the command "Play"; pause command "Pause" and "Stop".

In the case of navigation through the songs is valid, the above principle of inspection of the audio player. To activate the forward navigation use the commands "Forward", "Next" and "Next"; to activate a navigation back command "Back" and "Previous".

the

speech Recognition

The process of voice recognition is divided into three stages:

record a voice command to a file
recognition commands on the server
identifying a command on the device.

Record a voice command to a file

In the beginning you need to create a user interface for capturing voice commands. To simplify the example, we start and finish the recording by pressing the button, since the implementation of the process of detecting the beginning and end of a voice command deserves a separate article.

the

IconButton {
property bool isRecording: false

width: Theme.iconSizeLarge
height: Theme.iconSizeLarge
icon.source: isRecording ? "image://theme/icon-m-search" :
"image://theme/icon-m-mic"

onClicked: {
if (isRecording) {
isRecording = false
recorder.stopRecord()
yandexSpeechKitHelper.recognizeQuery(recorder.getActualLocation())
} else {
isRecording = true
recorder.startRecord()
}
}
}

From the code above, it is clear that the button uses standard values for sizes and standard icons (an interesting feature of the Sailfish OS for the unification of interfaces of the applications) and has two States. In the first state when the recording is not performed, after pressing the button starts recording a voice command. In the second state when the command is active, after pressing the button the recording stops and starts voice recognition.
To record speech are going to use the class QAudioRecorder, to provide high-level control interface input audio stream, and also QAudioEncoderSettings to configure the recording process.

the

class Recorder : public QObject
{
Q_OBJECT

public:
explicit Recorder(QObject *parent = 0);

Q_INVOKABLE void startRecord();
Q_INVOKABLE void stopRecord();
Q_INVOKABLE QUrl getActualLocation();
Q_INVOKABLE bool isRecording();

private:
QAudioRecorder _audioRecorder;
QAudioEncoderSettings _settings;
bool _recording = false;
};

Recorder::Recorder(QObject *parent) : QObject(parent) {
_settings.setCodec("audio/PCM");
_settings.setQuality(QMultimedia::NormalQuality);
_audioRecorder.setEncodingSettings(_settings);
_audioRecorder.setContainerFormat("wav");
}

void Recorder::startRecord() {
_recording = true;
_audioRecorder.record();
}

void Recorder::stopRecord() {
_recording = false;
_audioRecorder.stop();
}

Recorder QUrl::getActualLocation() {
return _audioRecorder.actualLocation();
}

bool Recorder::isRecording() {
return _recording;
}

Specifies that the command will be carried out in wav format in a normal quality and defines methods to start and stop recording, to obtain the storage location of the audio file and process status entry.

Recognition commands on the server

To broadcast audio file to text will be used the Yandex SpeechKit Cloud. All that is required to get started with it is to get tokens in the office developer. Documentation service in sufficient detail, so we will dwell only on private moments.

The first step will give the recorded command to the server.

the

void YandexSpeechKitHelper::recognizeQuery(QString path_to_file) {
QFile *file = new QFile(path_to_file);
if (file->open(QIODevice::ReadOnly)) {
QUrlQuery query;
query.addQueryItem("key", "API_KEY");
query.addQueryItem("uuid", _buildUniqID());
query.addQueryItem("topic", "queries");
QUrl url("https://asr.yandex.net/asr_xml");
url.setQuery(query);
QNetworkRequest request(url);
request.setHeader(QNetworkRequest::ContentTypeHeader, "audio/x-wav");
request.setHeader(QNetworkRequest::ContentLengthHeader, file->size());
_manager->post(request, file->readAll());
file->close();
}
file->remove();
}

There is formed a POST request to Yandex server which transmitted the received token, the unique device ID (in this case use the MAC address of the WiFi module) and the request type (here we have used "queries", as in voice interaction with the device most commonly used short and precise commands). In the request headers specify the format of the audio file and its size, in the body content directly. After the request is sent to the server, the file is deleted as unnecessary.

In response, the SpeechKit Cloud server returns XML with the options of recognition and degree of confidence in them. Use the standard means of Qt, to select the desired information.

the

void YandexSpeechKitHelper::_parseResponce(QXmlStreamReader *element) {
double idealConfidence = 0;
QString idealQuery;
while (!element->atEnd()) {
element->readNext();
if (element->tokenType() != QXmlStreamReader::StartElement) continue;
if (element->name() != "variant") continue;
QXmlStreamAttribute attr = element->attributes().at(0);
if (attr.value().toDouble() > idealConfidence) {
idealConfidence = attr.value().toDouble();
element->readNext();
idealQuery = element->text().toString();
}
}
if (element- > hasError()) qDebug() << element->errorString();
emit gotResponce(idealQuery);
}

Here it examines the response and, for variant tags, verified accuracy rate of recognition. If the new version is more correct, it is saved and scanning continues. At the end of the view response sends a signal with the selected text command.

Identification of a command on the device

Finally, it remains to identify the team. At the end of the method YandexSpeechKitHelper::_parseResponce, as mentioned above, sends a signal gotResponce that contains the command text. Next, we need to handle it in the QML code of the program.

the

Connections {
target: yandexSpeechKitHelper
onGotResponce: {
switch (query.toLowerCase()) {
case "music":
dbusHelper.startMediaplayerIfNeed()
mediaPlayer.shuffleAndPlay()
break;
case "play":
mediaPlayerControl.play()
break;
case "pause":
case "stop":
mediaPlayerControl.pause()
break;
case "forward":
case "next":
case "next":
mediaPlayerControl.next()
break;
case "back":
case "previous":
mediaPlayerControl.previous()

default:
generateErrorMessage(query)
break;
}
}
}

It uses the element Connections for processing the incoming signal and comparing the detected command with the voice command templates defined earlier.

the

managing a running player

If the audio player is open, then it is possible to communicate through standard DBus-interface, inherited from the linux big-brother. It can be used to navigate the playlist, to start or pause playback. This is done using a QML element DBusInterface.

the

DBusInterface {
id: mediaPlayerControl

service: "org.mpris.MediaPlayer2.jolla-mediaplayer"
iface: "org.mpris.MediaPlayer2.Player"
path: "/org/mpris/MediaPlayer2"

function play() {
call("Play", undefined)
}

function pause() {
call("Pause", undefined)
}

function next() {
call("Next", undefined)
}

function previous() {
call("Previous", undefined)
call("Previous", undefined)
}
}

This element is used DBus-interface standard audio player by identifying four basic functions. The undefined call is transmitted only if a DBus method that takes no arguments.

It should be noted that to move to the previous song Previous method is called twice, as it is a single call causes playback of the current song from the beginning.

the

playback starts from scratch

In that player there is nothing difficult. However, if you have a desire to start playing music when it is closed — the issue occurs because, by default, the functionality of running a standard player with the simultaneous playback of the entire collection is not available.

But we should not forget that the Sailfish OS — an operating system with open source code available for free modification. As a consequence, the problem can be solved in two stages:

the

Expand the functions provided by the player via DBus-interface;
to Implement starting player (if necessary) and begin playback immediately after starting.

expansion of the functions of a typical audio player

A standard audio player interface in addition to the org.mpris.MediaPlayer2.Player, provides an interface com.jolla.mediaplayer.ui defined in the file /usr/share/jolla-mediaplayer/mediaplayer.qml. From this it follows that it is possible to modify this file, adding the needed function.

the

DBusAdaptor {
service: "com.jolla.mediaplayer"
path: "/com/jolla/mediaplayer/ui"
iface: "com.jolla.mediaplayer.ui"

function openUrl(arg) {
if (arg[0] == undefined) {
return false
}

AudioPlayer.playUrl(Qt.resolvedUrl(arg[0]))
if (!pageStack.currentPage || pageStack.currentPage.objectName !== "PlayQueuePage") {
root.pageStack.push(playQueuePage, {}, PageStackAction.Immediate)
}
activate()

return true
}

shuffleAndPlay function() {
AudioPlayer.shuffleAndPlay(allSongModel, allSongModel.count)
if (!pageStack.currentPage || pageStack.currentPage.objectName !== "PlayQueuePage") {
root.pageStack.push(playQueuePage, {}, PageStackAction.Immediate)
}
activate()
return true
}
}

There was modified the element DBusAdaptorused to provide a DBus-interface by adding the method shuffleAndPlay. It uses the standard functionality of the player to play all songs in random order, provided by the module com.jolla.mediaplayer, and displayed on the front current play queue.

Within the example, for simplicity, was performed by simple modification of the system file. However, the distribution of the software, such changes should be documented in the form of patches using instructions.

Now the development program need to refer to the new method. This is done using the already familiar DBusInterfacein which the connection to a particular service above and implemented the call is added to the player function.

the

DBusInterface {
id: mediaPlayer

service: "com.jolla.mediaplayer"

path: "/com/jolla/mediaplayer/ui"

shuffleAndPlay function() {
call("shuffleAndPlay", undefined)
}
}

Launch the player if closed

Finally, the last thing left — start music player if it is closed. Conventionally, the task can be divided into two stages:

the

immediately launch player
waiting scanning your music collection.

the

void DBusHelper::startMediaplayerIfNeed() {
QDBusReply < bool > reply =
QDBusConnection::sessionBus().interface ()- > isServiceRegistered("com.jolla.mediaplayer");
if (!reply.value()) {
QProcess process;
process.start("/bin/bash -c \"jolla-mediaplayer &\"");
process.waitForFinished();

QDBusInterface interface("com.jolla.mediaplayer", "/com/jolla/mediaplayer/ui",
"com.jolla.mediaplayer.ui");
while (true) {
QDBusReply < bool > reply = interface.call("isSongsModelFinished");
if (reply.isValid() && reply.value()) break;
QThread::sleep(1);
}
}
}

The code of the functions it is seen that the first step is to check the necessary DBus-service. If he was in the system, the function terminates and proceeds to start playback. If the service is not found, it creates a new instance of the audio player using the QProcess, with the full expectation of its launch. In the second part of the function, use the QDBusInterface, it checks for the end flag scan the music collection on the device.

It should be noted that to check the flag of the scan was made two additional changes to the file /usr/share/jolla-mediaplayer/mediaplayer.qml.

First, was modified by the element GriloTrackerModel provided by the module com.jolla.mediaplayer by adding a flag to the end of the scan.

the

GriloTrackerModel {
id: allSongModel

property bool isFinished: false

query: {
//: placeholder string for albums without a known name
//% "Unknown album"
var unknownAlbum = qsTrId("mediaplayer-la-unknown-album")

//: placeholder string to be shown for media without a known artist
//% "Unknown artist"
var unknownArtist = qsTrId("mediaplayer-la-unknown-artist")

return AudioTrackerHelpers.getSongsQuery("", {"unknownArtist": unknownArtist, "unknownAlbum": unknownAlbum})
}

onFinished: {
isFinished = true
var artList = fetchAlbumArts(3)
if (artList[0]) {
if (!artList[0].url || artList[0].url == "") {
mediaPlayerCover.idleArtist = artList[0].author ? artList[0].author : ""
mediaPlayerCover.idleSong = artList[0].title ? artList[0].title : ""
} else {
mediaPlayerCover.idle.largeAlbumArt = artList[0].url
mediaPlayerCover.idle.leftSmallAlbumArt = artList[1] && artList[1].url ? artList[1].url : ""
mediaPlayerCover.idle.rightSmallAlbumArt = artList[2] && artList[2].url ? artList[2].url : ""
mediaPlayerCover.idle.sourcesReady = true
}
}
}
}

Secondly, I added another feature accessible via DBus-interface com.jolla.mediaplayer.ui that returns the value of the status flag of the scan audio files.

the

function isSongsModelFinished() {
return allSongModel.isFinished
}

the

a Message about incorrect command

The last part of the example is a voice message about the wrong command. For this we use a service speech synthesis Yandex SpeechKit Cloud.

the

Audio { id: audio }

generateErrorMessage function(query) {
var message = "Sorry.  Command  "+ query + " not found."
audio.source = "https://tts.voicetech.yandex.net/generate?" +
"text=\"" + message + "\"&" +
"format=mp3&" +
"lang=EN-us&" +
"speaker=jane&" +
"emotion=good&" +
"key=API_KEY"
audio.play()
}

Here the object was created Audio for playing generated speech and declared the generateErrorMessage to make a request to Yandex server and start the playback. The request takes the following parameters:

the

text — the text for synthesis (the message on the wrong voice command), the

format — the format of the file (mp3)

lang — language phrases (Russian),

speaker the voice sounds (female)

emotion — the emotional coloring of voice (friendly),

key — obtained in the beginning of the article the key.

the

Conclusion

In this article is considered a simple example of controlling music playback in a standard audio player Sailfish OS using voice commands; and repeated basic knowledge on pattern recognition and synthesis of speech with the help of Yandex SpeechKit Cloud using the Qt tools, and the principles of interaction of programs with each other in Sailfish OS. This material can serve as a starting point for deeper investigations and experiments in this operating system.
An example of the code you can see in the video:

Author: Peter Vitovtov

Article based on information from habrahabr.ru

Поиск по этому блогу

computer express