My life with Android :-)

Friday, November 21, 2014

Motor boat control with Bluetooth Low Energy

 My previous post about Bluetooth Low Energy applications with RFDuino and Android presented a connectionless gas sensor. That prototype was based solely on BLE advertisements, no connection was built between the scanner device (Android phone or tablet) and the sensor. While this connection-less operation is advantageous for sensors that just broadcast their measurement data, more complex scenarios that e.g. require authentication or build a communication session cannot be implemented in this model. The prototype I am going to present in this post demonstrates connection-oriented operation between RFDuino and an Android application.

Watch this video to see what the application is about.



The story behind this motor boat project is that I bought this RC-controlled model boat while I worked in the UK. But when we moved back to Hungary, I lost the RC controller. So the boat had been unused for years until I realized how great it would be to use an Android device as a controller. Hence I quickly integrated the RFDuino with the motor boat's original control circuitry and wrote the necessary software. As you can see in the video, it has quite respectable range even though I did not dare go into the October water of Lake Balaton where the second part of the video was shot (water temperature: some 10 degrees centigrade).

First about the "hardware". I did not have the circuit schema of the original RC controller in the boat so I had to experiment a bit. By following the motors' cables I quickly found two three-legged stocky elements that looked like switching transistors (although the labels on them were not readable after all those years in service). I removed one end of the resistors that I thought connected the base of these transistors to the rest of the RC control circuit and tried out, how much current is needed to switch on the motors. To my pleasant surprise, 1 mA current was enough so I rather believe that these are actually not transistors but power switching ICs. Anyway, RFDuino outputs can provide 1 mA switching current so I just connected the other end of those removed resistors to two spare RFDuino I/O ports. Lo and behold, it worked. If RFDuino raises any of these pin to 1, the respective motor starts. One minor additional problem was about the power supply of RFDuino. The motor boat employs an 7.2 Volt battery and RFDuino needs 3.3 V. I added an LM1117-3.3V power regulator circuit between the battery and RFDuino and the "hardware" was ready.



Do you know about BLE concepts like service and characteristics? If not, please read this presentation for a quick introduction. In short: BLE services (also called GATT profiles) are composed of characteristics which are key-value pairs decorated by meta-information that the BLE specification calls descriptors. RFDuino with its default firmware is not able to implement any standard GATT profile except for its own custom GATT profile. This is a major disadvantage in product-level development but makes RFDuino code super-easy because the programmer does not have to deal with BLE details. In the RFDuino custom service, a "read" and a "write" characteristic is defined. Whatever the client (in our case, the Android application) writes into the "write" characteristic appears for the RFDuino code as incoming data callback. If the RFDuino code calls the RFduinoBLE.send(v) method, the data appears in the "read" characteristic. The Android client can register a callback for data manipulations of the "read" characteristic so it will receive callbacks when RFDuino code invokes the RFduinoBLE.send() method. There are additional callbacks for service connections and disconnections.

You can download the Android project and the RFDuino source at the end of this post. You have to be logged to the Sfonge site to access these.

First about the RFDuino code.  Beside the familiar setup() and loop() functions, you will see three functions characteristic to RFDuino. RFduinoBLE_onConnect() is called if a client connects to the RFDuino BLE service, RFduinoBLE_onDisconnect() is called on disconnection and RFduinoBLE_onReceive() is called when there is incoming data. There is only one complication in the code that requires explanation: this is a powerful motorboat and it can go out of BLE radio range very quickly. In the early versions the boat became uncontrollable in this case meaning that it just continued with the last command received. That was not a nice feature so I implemented a heartbeat message feature which is sent in every 5 second. The loop() method starts by sending the heartbeat to the client and goes to sleep. It wakes up after 5 seconds and if it finds that the client did not send back the heartbeat, it stops the motors. Otherwise the client just sends a bit mask about which motors to stop or start and the RFDuino just responds with the same mask informing the client that the motors were indeed started or stopped. This means that the arrows on the user interface showing which motors are running represent the actual state of the boat which is advantageous if something goes wrong.

Then about the Android side. The code starts by discovering the device. Once the device is discovered, we connect to the device with the connectGatt() method then discover its services with gatt.discoverServices() method. Once the service discovery callback arrives, we retrieve the RFDuino service (getService(), we expect this custom service) and obtain the characteristic handles (getCharacteristic()). We use the "read" characteristic's client configuration descriptor to enable notifications from the RFDuino server to the Android client so that we get a callback when the RFDuino side sends something to us.

Disconnection is worth detailing because there's an RFDuino speciality here. Normally, one can just disconnect from the service with the disconnect() method invocation. RFDuino however is left in a limbo state in this case: the BLE session is disconnected but the RFDuino application does not receive a callback and cannot accept a new connection request. The "disconnect" characteristic has to be written to (the value does not matter) for the RFDuino server to properly disconnect.

Friday, October 3, 2014

Award for our Android Gas Sensor

I just got a mail that our gas sensor entry (a gas sensor with Bluetooth Low Energy connectivity and the associated Android application) has just won 3rd place on the We Know RFDuino contest. Thanks to everyone who viewed our video and thus helped us to compete successfully! Meanwhile the source code of the prototype was made open source so you may want to check out that too!

Monday, September 29, 2014

Gas sensor prototype explained

The "We know RFDuino" contest has not ended yet but its end is sufficiently close so that I can explain our prototype application. Our entry is a Bluetooth Low Energy-connected gas sensor and it is presented in the video below. Make sure that you watch it, you help us win the competition.



The prototype demonstrates a unique capability of Bluetooth Low Energy device advertisement messages: you can embed user data into these broadcasts. These come handy if you just want to send out some measurement data to whoever cares to listen without creating a session between the BLE client and server. This broadcast-type data transfer may support unlimited number of clients with very low energy consumption on the sensor side.

Click here to download the Android client application project.

Click here to download the RFDuino source code.

The prototype works like the following. The microcontroller presented in the video measures the Lower Explosion Limit and sends this value to the RFDuino microcontroller over a super-simpe serial protocol. A message of this protocol looks like this:

0xA5 <seq_no> <LEL%>

where seq_no is an increasing value and LEL% is the measured Lower Explosion Limit value. The microcontroller code is not shared here but you can get the idea. The RFDuino code receives the LEL% value over the serial port it creates on GPIO pins 3 and 4, creates a custom data structure for BLE advertisements consisting of the site ID and the LEL% value then starts advertising. This is performed cyclically so the LEL% value is updated in the sensor's BLE advertisement every second.

Now let's see what happens on the Android side. This is a non-trivial application with multiple activities but the Real Thing (TM) happens in the MapScreenActivity, in the onLeScan method. This method is called every time the Android device's BLE stack discovers a device. In this case we check whether the device's name is "g" (this is how we identify our sensor) and we retrieve the LEL% data from the advertisement packet.  We also handle the Received Signal Strenght Indicator (rssi) value for proximity indication. Bluetooth device discovery is restarted in every 2 seconds so that we can retrieve the latest LEL% value. The rest is just Plain Old Android Programming.

The identification of the sensor and the encoding of the sensor data is obviously very naive but this is not really the point. You can make it as complex as you like, e.g. you can protect the sensor data with a hash and place that hash also into the advertisement so that the receiver can make sure that it gets data from an authorized sensor and not a fake one. The important thing is that the entire framework is sufficiently flexible so that relatively complex functionality can be implemented and RFDuino really simplifies sensor programming a lot.

If you enjoyed the example application, make sure you watch the video (many times if possible :-)) and if you happen to be in London on 2014 November 19, you might as well come to the Londroid meetup where I present this and another BLE project (a connection-oriented one, called MotorBoat).

Saturday, September 6, 2014

Camera shot on charger connection

Somebody came to me with an idea whether a cheap Android phone can be turned into an automatic camera. Some external sensor would send a signal to the phone and the phone would take a picture automatically. We started to discuss the possible connection of the external sensor and an interesting idea came up: the charger connection.

Android delivers an event whenever the charging power is connected or disconnected: can it be used to send a binary signal to an application in a very simple way, without fiddling with Bluetooth or USB?

Click here to download the example application.

You have to start the application once. Then whenever you connect the charger, it takes a picture. When the application is in the foreground, a preview is shown but as long as the application is active (not destroyed) it works from the background too.

Here are the experiences:

  • On my high-end device the application reacted quickly to charger connection, the reaction time from connecting the charger to the camera shot was less than a second. But when the application was tested on the very low-end Android target device, the picture was much less rosy: the delay increased to 3-4 seconds, effectively making the solution unusable.
  • In order for this application to work, it has to be started at least once manually. This pretty much kills all unattended use cases.
  • The shutter sound is almost impossible to remove. Update: on certain devices (Nexus 4 and Nexus 7 confirmed) there is no shutter sound in silent mode.
The takeaway for us was to reject the idea. But I share the example program anyway, maybe it can be useful for somebody.


Tuesday, September 2, 2014

Android gas sensor application with Bluetooth Low Energy/RFDuino

I have always had a fascination with sensors linked up with mobile devices so it seemed just a good opportunity to try out the latest fashionable technology in the area, Bluetooth Low Energy in the context of a competition. SemiconductorStore.com announced the "We know RFDuino" competition for applications of the RFDuino module. RFDuino is an Arduino module with Bluetooth Low Energy (BLE) support. It is ideal to act as an interface between a sensor and a BLE-enabled mobile device like the Nexus 7.

Eventually I will publish the entire source code of this prototype application on this blog. But as this is a contest, I will wait until the contest ends (Sept. 30). Till then, watch the (very amateurish) video we have prepared about our sensor and the Android application. The entry with the most views wins the contest so if you like the concept, share the video with others! Thanks in advance. :-)


Tuesday, March 11, 2014

Beyond RenderScript - parallelism with NEON


My last post about the parallel implementation of Distributed Time Warping (DTW) algorithm was a disappointment. The RenderScript runtime executed the parallel implementation significantly slower than the single-core implementation (also implemented with RenderScript). It turned out that parallelizing the processing of 10000-50000 element vectors on multiple cores were not worth the cost of the multi-thread processing and all the overhead that comes with it (threads, semaphores, etc.). One core must be allocated a significantly larger workload but our DTW algorithm is not able to generate such a large, independent workload because rows of the DTW matrix depend on each other. So in order to exploit RenderScript multi-core support, it is best to have an algorithm where the output depends on only the input and not on some intermediate result because this type of algorithm can be sliced up easily to multiple cores.

It would have been such a waste to discard our quite complicated parallel processing DTW algorithm so I turned to other means of parallel execution. Multi-core is one option but the ARM processors in popular Android devices have another parallel execution engine, internal to the core, the NEON execution engine. One NEON instruction is able to process 4 32-bit integers in parallel (see picture below). Can we speed up DTW fourfold with this option?



NEON is actually quite an old technology, even Nexus One was equipped with it. It is much more widely deployed therefore than multi-core CPUs. While ordinary applications can take advantage of multi-core CPUs (e.g. two processes can execute in parallel on two cores), NEON programs are difficult to write. Although some compilers claim the ability to generate NEON code and template libraries are available, the experience is that the potential performance benefits cannot be exploited without hand-coding in assembly and that's not for the faint hearted.

The example program is attached at the end of this post. You have to be logged to the Sfonge site to access it.

The relevant functions are in jni/cpucore.c. There are 3 implementations, processNativeSlow, processNative and processNativeNEON, each is progressively more optimized than the previous one. The processNativeSlow and processNative functions are in C, in processNativeNEON the most time-critical loop ("tight loop") is entirely implemented in mixed ARM/NEON assembly. This tight loop produces 4 result elements in parallel so we expect huge performance gain over the single-core RenderScript implementation (dtw.rs).

The experience is completely different. While the NEON implementation is significantly faster on small datasets, one second of voice is 8000 samples so data sizes grow quickly. On 10 second data sets (80000 samples, 6.4 billion element DTW matrix) the simple nested loop C99 implementation and the complex, hard to understand NEON implementation produces about the same execution time.

How is this possible? Let's take an example of 10 second reference and evaluation samples. This means 80000 elements, 80000*80000=6.4 billion values to calculate. Calculating each value takes 20 bytes to access (2 input samples (2 bytes each), 3 neighbor cells (4 bytes each) and storing the result (4 bytes)). A1 SD Bench measures 800 Mbyte/sec copying performance on my Galaxy Nexus (and similar values on the two cheap Android tablets that the family has), that obviously means 2 accesses (one read and one write). For simplicity, let's assume that reads and writes take about the same time. This means that according to this very rough calculation, the memory accesses themselves take about 80 sec. The real execution time is about 120 sec, the difference can be explained by the simplifications. Cache does not really help because of the large data size. The performance is determined by the RAM speed and the simplest single-core implementation already reaches the bottleneck. All the wizardry with parallelism is futile.

Obviously the case was not helped by the selection of the DTW algorithm as benchmark which intentionally does not fit into the class of algorithms normally used to demonstrate the benefits of parallel processing. Grayscale conversion would be better (one read, one write and 3 multiplications per pixel). But this means that you actually have to be really lucky with your algorithms for these parallel options to speed up your code significantly. Even then, it is worth looking at the parallel options inside the core before going multi-core. And you definitely should not forget the auxiliary costs of parallel computation, e.g. distributing/gathering the data to/from the parallel processing units or whether other hardware (e.g. memory) is able to keep the pace with the CPU.

One wild idea at the end. Could RenderScript computation model be used to generate NEON code? With some limitations, the answer is probably yes.

Friday, February 7, 2014

RenderScript in Android - the parallel version

In the previous post I promised to revisit the parallel case. The big promise of RenderScript is to exploit parallelism among different CPUs, GPUs and DSPs in the device at no additional cost. Once the algorithm is properly transformed into parallel version, the RenderScript runtime grabs whatever computing devices are available and schedules the subtask automatically.

The problem with DTW is that it is not so trivial to parallelize. Each cell in the matrix depends on cells at (x-1,y), (x-1,y-1) and (x,y-1) (provided that the cell to calculate is at (x,y)). By traversing the matrix horizontally or vertically, only two rows (one horizontal and one vertical) can be evaluated in parallel.

Michael Leahy recommended a paper that solves this problem. This algorithm traverses the matrix diagonally. Each diagonal row depends on the two previous diagonal rows but cells in one diagonal row don't depend on each other. One diagonal row can be then fed to RenderScript to iterate over it. The picture below illustrates the concept.



The example program can be downloaded at the end of this post. You have to be logged to Sfonge site to access it.

You will notice that there are two parallel implementations. The findReferenceSignalC99Parallel() is the "proper" implementation that follows closely the RenderScript tutorial. Here the diagonal rows are iterated in Java and only the parallel kernel is implemented in RenderScript. This version - even though it is functional - is not invoked by default because it delivers completely inacceptable performance on my 2-core Galaxy Nexus. By looking closely at the execution times, I concluded that even though RenderScript runtime invocations ( copying into Allocations and invoking forEach) are normally fast, sometimes very innocent-looking invocations (like copying 5 integers into an Allocation) can take about a second. This completely ruined this implementation's performance.

The other parallel implementation which is actually invoked and whose performance is compared to the 1-core RenderScript implementation (the fastest one) is findReferenceSignalC99ParallelRSOnly(). This version is implemented entirely in RenderScript. Unfortunately its performance is 2-2.5 times slower than the 1-core implementation. How can it be?

First, if you compare dtw.rs and dtwparallel2.rs, you will notice that the parallel implementation is considerably more complex. Indexing out those varying-length diagonal rows takes a bit of fiddling while the 1-core implementation can take the advantage of fast pointer arithmetic to move from cell to cell sequentially. So the parallel implementation starts with a handicap. This handicap is not compensated by the 2 cores of the Galaxy Nexus.

OK, Galaxy Nexus is the stone age but what happens on a 4-core processor like on a Nexus 4? The runtime does launch with 4 cores but then the Adreno driver kicks in and the result is that the parallel implementation is about 3 times slower than the serial one. What happens in the driver, I don't know, as far as I can see, the source code is not available.

Jasons Sams recommended to disable the GPU driver with
adb shell setprop debug.rs.default-CPU-driver 1
but I decided to stop my adventures here. The conclusion I drew for myself is that RenderScript in its present form is not ready for parallel programming. Clang-LLVM is a very promising compilation technology but the parallel runtime suffers from a number of problems. IMHO, there should be a way to programmatically control the way the workloads are allocated to CPUs/GPUs. Until then, if you want to harness the power of your multicore processor, code the parallel runtime yourself. Using RenderScript for the serial code if you wish.