Kuntal's thoughts: 2016

Monday, December 5, 2016

The elements of learning to bike

As an accomplished father who just taught his 5yo to ride a bike in 4 sessions of 30-45 minutes each, I can unequivocally say that starting with the pedals removed is indeed a quicker way to learn. While I did not refer to it before teaching my daughter, I referenced it before writing this post and I can say that this post by REI documents the process well.

Here are the implicit rules my daughter followed while learning how to bike:

If the bike went too close to parked cars, she'd stop.
Implicit rule: don't touch things that do not belong to you. She has no problems running into me with her scooter.
If the bike went too fast for her comfort, she'd drag her feet and bring it to a stop.
Implicit rule: uncontrolled speed may cause injury.
When there is little room to steer between people on one side and a tree on the other side, get closer to the trees.
Implicit rule: don't hurt a human.

If something unexpected happened, such as a fall, then she'd take a break before getting convinced to start again. This seems to be a critical aspect of learning - to fail, pause to internally assess what went wrong and retry with a plan.

I'll close with her picture - the sheer joy of learning something new is quite apparent!
Regards,
Kuntal.
PS: I wrote this back in July 2016 but didn't get around to publishing it.

Friday, September 9, 2016

Halide for ADAS

Luggage space will decide the winner of autonomous vehicle race.

Automakers have two choices towards meeting sensor fusion compute requirements for autonomous vehicles - take up all the trunk space for server-grade computers OR optimize processing so that compute can fit in a pizza box and leave room in the trunk for luggage.

As I described in my earlier post on ADAS framework challenges, optimizing sensor processing tasks is a time intensive process and may well be the rate-determining factor in deciding on how to use the trunk.

I have looked at image processing frameworks available in the industry and I have come to the conclusion that Halide is the most suitable framework to quickly connect sensor processing algorithms and experiment with their optimal load distribution on an SoC. I encourage system designers to learn about Halide directly from the link above. The key features that make Halide attractive for an ADAS sensor compute framework are:

Split algorithm from implementation

This allows the algorithm designer to focus on the algorithm and not on how to optimally construct it for a particular SoC.

Enable implementation iteration over cores, threads, rows, tiles, SIMD sizes

An SoC expert can take a working algorithm and then experiment with various load distributions without worrying about changing the algorithm. Thus if a particular SoC has a mix of CPU, GPU, DSP, vision-specific accelerators, the implementation can quickly take advantage of these elements.

Function-call type pipeline constructs

Frameworks such as OpenVX mistakenly focus on pipeline connections and conformance as the problems to be solved, but these are rather trivial aspects as compared to the bigger problem of quick experimentation with load-distribution.

Defined interfaces between Halide and C++ based functions such as that in OpenCV

Halide isn't suitable for control-intensive algorithms such as finding contours. One can easily integrate OpenCV with Halide.

The ability to visualize performance

This is an often overlooked aspect of optimization. Halide has a built-in, elegant way of visualizing performance.

Finally, although the Halide run-time and tools haven't yet been qualified for ISO26262, the way the language is constructed makes it well suited for ADAS usage nonetheless. Image boundaries (a common source of bugs) are syntactically handled. Bounds inference on input and output buffers is automated.

All said and done, Halide offers a faster path to move from server-sized compute to optimized compute in a smaller form factor. I'd rather buy a car that I can load my luggage into.

And it might be a while until my next post.

Kuntal.

Wednesday, June 1, 2016

4 way intersections with autonomous cars

Currently we have well defined rules when cars approach a 4 way intersection.
The question is, will autonomous cars follow the same rules? Is a full stop needed?
Here's an animation I made from Moovly to illustrate how cars could pass through a 4 way intersection without stopping. Sometimes when the signals are broken and flash red, I wish we could do this manually!

I think that even fully autonomous vehicles will be expected to stop at an intersection. Here's why:
"Normal" autonomous use cases involve keeping a safe distance, path planning, reading traffic signs, etc. All of the algorithms providing these can have a reasonable "sensor to actuation" time.
At an intersection, these margins essentially get divided by the number of vehicles entering the intersection potentially stressing the peak performance of the algorithms and impacting safety negatively. Another reason could be to deliberately limit path co-operation rules to control algorithmic complexity.
A safer alternative would be to follow the following rules:

Stop at an intersection
Do not proceed if there is a vehicle in the intersection
Negotiate with all cars at the intersection before proceeding. This step can be designed to handle situations where a mix of human-driven and autonomous vehicles share the road.

BTW, I have renamed my blog to Level4 since I find that I'm exclusively writing about autonomous vehicles and not about mobile and other technologies.
Till next time,
Kuntal.

Thursday, May 19, 2016

Machine Learning - more 5 year old milestones

Sundar spoke about the expected impact of machine learning at I/O. I'd like to put forth my observations of my 5yo daughter, who has just learnt to read, continuing from my earlier post on the topic. My sensitivity of this topic has been heightened as I work on autonomous driving in my day job.

So here we are reading Dick and Jane, which is still an excellent book for beginner readers.

As one progresses through the book you get introduced to the family. Here are their pets, on the page to the right

The cat's name is Puff, the teddy bear is Tim and the dog's name is Spot.

At one point in the book my daughter sees these pictures:

Without reading the text, my daughter said that it looks like they're making Spot and Puff out of PlayDoh. She thought Sally (the girl in the middle) was going to make a Tic Tac Toe game. She was thrilled when she read the text and found that they were indeed making their pets!

The memory from her own past experience of making things with PlayDoh was applied effortlessly by her. She chose to label the pets with names, not leaving them as generic "dogs" and "cats".

After seeing the final Tim, she shook her head and said that it didn't quite look like him.

I think it's still pretty hard for CV algorithms to get to this point without significant human help.
To add to it, she saw the text in the following image and said that it looks like an upside down staircase. This sort of lateral thinking is probably further away than teaching AlphaGo to play chess too.

That's all for today.
Kuntal.

Wednesday, March 9, 2016

ADAS framework challenges

After studying the ADAS solution providers based on computer vision, one thing stands out to me. A lot of vendors have excellent demos of individual algorithms, such as lane detection, pedestrian detection, etc. But when asked if they're putting it all together, a common answer is either we're working on it or we have a fixed set of algorithms running together. I believe that putting algorithms together in an efficient manner will be a fundamental challenge to be resolved for mass-market low-cost low-power ADAS.

Here's what I see as the primary challenges that need to be solved:

Syntactical expression of pipelines, that enable re-use of processing blocks
There are too many technologies to learn for a developer:

Algorithms
Operating systems (for scheduling the algorithms to take advantage of multi-core architectures)
SIMD / Vectorization (for taking advantage of special architectures for vision processing)
GPU usage - either for simple parallelism or with a deep learning framework.

The need to iterate quickly - as algorithms get updated and workloads shift with addition of custom hardware.

I built the audio concurrency framework that's been running on all of Qualcomm's 7K (from the first Android phone onwards), 8K and 9K platforms. The challenges there were: audio and voice concurrency, the need to handle different sampling rates and the long list of post-processing blocks. I believe that we solved it elegantly with the right balance of re-configurability and simplicity, in resource constrained (MHz and Memory) platforms. I see similar challenges in CV based solutions for ADAS.

It will be a fun ride.

Kuntal.

Friday, January 29, 2016

Is your Machine Learning algorithm smarter than a 5 year old?

I'd like to offer some thoughts on how "learning" differs between the new poster child for machine learning - "Convolutional Neural Networks" and my 5 year old daughter, who is learning how to read. If you want to skip the rest of the post, here's the conclusion - fears of AI taking over the world are vastly overblown.

While reading, she started reading words one letter at at time, not restarting for every syllable. Now, when she reads, she groups more letters together and then reads them aloud at at time. Once she has read aloud the written pronunciation, she maps it to colloquial pronunciation with some hesitation. When she understands the context, the hesitation is less. Also, she has now become faster with sight words - familiar words which we can read without reading the spelling. When she began, she wouldn't remember which words she has already read so it would be a new exercise every time. Another interesting observation is that when reading "3-D," she ignored the "-" first, read 3D and then asked, what is the "-."

I showed her a puzzle book where one page had upper case letters printed on tea cups and lower case letters printed on saucers. Simply by looking at the page, she figured out that the goal was to match the upper and lower case forms and she started drawing lines from the tea cups to the saucers. She also noticed, and said aloud, that the since the colors of some of the saucers were the same, you don't have to match by color.

Then we flipped a few pages and found different pictures of two objects on two sides of a scale. Her question was - do I circle the heavier side or the lighter side?

Flip a few more pages and there is a picture of a maze. She exclaimed "Oh I love to do mazes" and traced a path through from a point that she thought was the starting point to the end. She didn't notice the start arrow at first so she simply built a path that made sense to her. Then when I said that start from the arrow, she said "oh" and effortlessly redrew the path.

In the last few days I've learnt about CNNs. I have a DSP background and have known about "traditional" ML methods such as Extended Kalman Filters for a while but CNN was new to me. The hierarchical representation does present a nice conceptual simplification of learning and classification. However, as I think about how rigorous the training procedure is for image classification, compared to how my daughter learns reading, I can safely draw the conclusion that we will be able to harness AI to our benefit. Conscious harm from an AI entity is in the far off future.

And I'll teach her to play Go. Watch out Google.

Later,
Kuntal.