I mentioned briefly in a previous post about how products like Amazon’s Alexa and Google Home have ethical consequences based on how far we could push them. So, how far can we push them? Based on my ethics post, you may have realized that I am not a big proponent of AI assistants, but I do not want to write a post convincing you why. Rather, what if we were to take these current devices from a more capitalistic approach? Ethics aside, what if we put business first and took home assistants further?

As a reminder, everything past this point are simply ideas and may or may not be feasible in today’s world. Rather, these are discussion points that could help us understand where AI may be heading in our homes.

Home Assistants Today

Looking at what we have today, we have devices that are essentially microphones and speakers attached to a computer. These devices listen 24/7 in order to listen to us when you say their “wake” word. The wake word is the command that instructs the device to start listening to you and process your command. When you query the device, it takes your voice, processes it locally or via the cloud, and then stores this information for the company to use later. What they use it for would primarily be to use the data to train the AI assistants further.1 We can assume that this data is also used to help profile customers.

The Give and Take Scenario

To me, a sensible businessman today understands that just selling a product to a customer isn’t enough. You need to create a give and take scenario between your business and your customer. In our case, we offer an AI assistant to make shopping, checking the weather, controlling your house, etc. easier and in return we take the data we collect from you and use it to improve our business. That may include selling this information to third parties in order to turn a profit–it essentially doesn’t matter as long as we are not wasting valuable customer information and use it to make more money.

Customer information is an endless flow of profit. In our homes we talk about everything from our personal lives and problems, to experiences and problems we have with the products that we use. The issue as a company providing an AI assistant is that the line is drawn when the customer has to say the wake word. While a device like the Amazon Alexa is constantly listening, it isn’t processing what we are saying. It is simply waiting for us to say “Hey Alexa.” There’s a wealth of information lying before the start of that wake word and the end of the query given. What customers say around Alexa can have extreme value, so being able to get that information and process it would be something big businesses would love.

Taking AI Further

AI already can attach to the devices we use to enhance our lives. Look at the smart light bulbs, vacuum bots, and smart locks that are on the market. Of course there are also many other devices that take advantage of your home AI assistant, but these things already exist today. While I’m sure other companies will figure out ways to make these experiences better and more personal, I am not going to go down that rabbit hole. I want to focus more on the concept of mining user data. If I am a company that has designed a home AI assistant, I know that giving my AI assistant’s API to other companies allows me to entrench myself into different markets. It would also give me data that I can use to further profile my customers.

To paint a better picture of what I am trying to convey, let’s take a device as simple as a smart light bulb. You hook it up to my AI assistant and then you say something like “Hey Computer, turn on the lights.” You would do this repeatedly overtime including giving the command to turn off the lights. In return I collect the amount of times and the length you leave the lights on or off. With this I can train an algorithm to look at different things. One possibility is if I see that you have a habit of turning the lights on between 3AM and 4AM and it becomes a consistent trend, my algorithm may look for things like if you are sleeping well. I could then add to your customer profile that you have trouble sleeping. In time, when my algorithm feels confident enough and assuming I run a marketplace like Amazon, I can offer you products to help you sleep better. Even if my company didn’t run a digital marketplace parallel to my AI assistant like Amazon, just by your frequent use of turning on and off the lights tells me a lot about you and I can sell this information to marketers who would love to know how you sleep at night.

Now, all of this information would be gathered based on the time willingly spent with an AI assistant. As of today, AI assistants (as far as we know) only collect data when we are actively using them and the devices they are connected to. In my opinion, the next logical step is to get the wealth of information that customers give away when they aren’t actively using an AI assistant.

Current Limitations

Before suggesting possible solutions there are current restraints when it comes to processing speech. In the case of processing anything and everything a customer says to an AI assistant, the first limitation that comes to mind is bandwidth. Not all customers are equal and we do need to assume that some customers may buy an AI device with very limited internet access. Amazon Alexa does most of it’s processing via the cloud rather locally. You can test this by disconnecting Alexa from your home internet and then asking her the typical queries like “what’s the weather like today?” Most likely, Alexa won’t be able to tell you.

The other limitation is processing power. Not only would we have to send large amounts of data over the wire to our data center, but we also need to process it. It may not be locally, but if you have millions of customer’s voices that you are trying to process and their collected audio is typically over 5 minutes each, processing what they are saying could be highly ineffective and time consuming. Any cloud platform in charge of handling speech recognition would quickly be overwhelmed.

Another issue is storing speech locally before sending it. For example, we may want to wait 5 or 10 seconds before sending what a customer said to the cloud, but if we are storing this data in memory, it can quickly build up. One solution is to build devices with more memory, but you can only add so much memory before you hit the same issue. People can have long conversations with no short pauses, so it may be easy to overwhelm an AI assistant trying to store your passive conversation.

Active and Passive Listening

Here is my take on implementing an AI assistant that listens 24/7:

Passive Listening:

Rather than needing wake words to start processing customer commands we instill two modes on the device: We have an active listening mode, where the customer uses a wake word and we have longer wait times before processing their command, and then we have passive listening mode, where we use no wake words and use shorter wait times before processing a command.

Let’s focus on the passive listening mode. One possible way is to use the microphone and wait for an input in sound. This can be anything from footsteps to actual speech. The device would open a socket (think of a socket as a connection between a host and a client) and we can begin sending audio data to our data center. We then process the data and determine whether it has any value. This solution would require less memory usage locally, but may increase CPU usage locally depending on how the socket is handled. We do need to worry about bandwidth however, as someone with a 2 – 5 Mb/s up and down could cause issues. We also need to take into account that we could be hogging up network bandwidth. If our device is listening to our customer watching a 4K movie on Netflix, than we are going to be uploading audio data from the movie while the movie itself streams on the same network. All of this of course relies on the customers bandwidth speed. If they have 50 Mb/s up and down, then this will be less likely of an issue. Once the conversation is over and we no longer have audio data to send we allow a one or 2 second grace period and then close our socket connection with our data center.

Processing Collected Data:

As data comes in from our AI devices, we need to begin understanding what our customers are saying. This can require an immense amount of processing power as we do not know when conversations will end and how long they are. One conversation could be 1 to 3 minutes while another maybe over an hour. Being able to have an algorithm comb through this data can be demanding.

One possible solution is to do the opposite. Rather than listen to each word in a conversation we can pick out key words that are more important than others. Words like and, the, it, then, etc. don’t tell us a lot, but words like sick, broken, car, refrigerator, etc. are more meaningful. If we only look for key words and then process the conversation afterwards this can help reduce processing time and also reduce the amount of processing needed from our speech algorithm. Let’s use the following example:

Let’s say we have a customer who is getting over a cold. We collected a short conversation with his/her spouse about having a cold. A sentence like “I am having a miserable day, I have such a bad cold,” gets sent to our speech processing algorithm. We pick out some keywords from this sentence based on how we trained our algorithm. Looking at the sentence as a human, the most important words in it that gives us the most valuable information is “miserable day” and “bad cold.” If we assume our algorithm trims the sentence thoroughly, we would get the following: “miserable day bad cold.” With that information alone, we can deduce the following: Our customer is having a bad day and she has a cold. Our algorithm can then determine the confidence that the probability of this statement is true and begin recommending cold medicine and also keep count of the amount of times our customer has been sick. By doing this we reduced the sentence to process from 12 words to 4 words.

This processing only works based on how well the algorithm is trained. Not everyone speaks in perfect sentences with very clear accents. We also can have data loss based on the stability of a customers network. Understanding dialect and determining how true a statement is is very important. We may even need to use data from other devices that our customer uses in order to reinforce our probability that a sentence implies its true meaning. For example, saying “I am sick, I had a cold” may be processed as “sick cold,” but does that mean they are currently sick or they were sick? We may need to pull data from devices connected to our AI assistant in order to be sure or we may need to use previous data to calculate the probability that our customer may have cold currently. The last thing we want to do is have our devices recommend products or sell data that is inaccurate. It creates bad experiences and lack of trust which in return reduces capital income.


Like anything when collecting data, we need to ensure customer security. We also need to be careful with how much we profile a customer. There can be legal consequences if we collect a conversation about a customer’s medical issue and put on their customer profile and then sell that to a third party. Legally we still have to oblige to government regulations and laws. While medical data is an invaluable resource for companies, we also don’t want lawsuits from customers suing us because they lost health coverage due to a withheld medical condition given away by us. If we are storing full or partial customer information, it needs to be encrypted and stored properly. This also brings the question of do we encrypt audio data before sending it through a socket or do we encrypt it after processing? Both have advantages and disadvantages as well as ethical concerns.


What I talked about above is only an idea and nothing more. There may be some gray areas in terms of performance and actual efficiency, but I believe the idea creates a good starting point. The point I’m trying to make however, is that just because the AI assistants in our homes now don’t listen to every single word we say, doesn’t mean they will always be that way. It will only be a matter of time before they do. When we do come to a point where companies are essentially rooted in our personal lives, we need to remember that sometimes our addiction towards convenience can have consequences that will affect us in the future. We need to ask ourselves how much we are willing to give to companies and organizations in return for a service.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s