Microsoft patent reveals new plans to spy users for better Bing search results

by Radu Tyrsina
Radu Tyrsina
Radu Tyrsina
CEO & Founder
Radu Tyrsina has been a Windows fan ever since he got his first PC, a Pentium III (a monster at that time). For most of the kids of... read more
Affiliate Disclosure

Microsoft has received ample criticism over the past year for introducing features which could compromise user security and to some extent, we agree that the company crossed the line on some occasions — especially with the EEF criticism. But Microsoft’s response to accusations of collecting unnecessary user data hasn’t convinced anyone of any other behavior. In the end, it looks like Microsoft will receive even more customer criticism if its latest patent filing feature is fired up.

The company refers to their patent filing software product as a “Query Formulation Via Task Continuum” and claims that it is going to make sharing in real-time between apps easier and more convenient, which would allow users to make more informed decisions while making searches. For instance, searching could be improved if sufficient information regarding a user’s objective is available.


Microsoft elaborated with an example: if someone is working on a dance-related project, to collect related data from the browser they’d have to type in what their requirements are into the search bar without the browser itself having no instinct or involuntary suggestion whatsoever.

Microsoft supports its idea by saying that in their current software model, applications are confined in their own silos, something which ultimately damages productivity and growth.

The first application does not provide the browser implicit hints as to what the user might be seeking when there is a switch from the first application to the second application.

The user perceives tasks in the totality. However, since applications are typically disconnected, and not mediated in any way by the operating system, the computing system has no idea as to the overall goal of the user.

According to Microsoft, a possible solution for this problem is to have a neutral third party arbitrator to monitor and learn user behavior and intent through a word processing mechanism, a PDF reader, the comparison and analysis of recently interacted images, the identification of sounds and music, the logging of frequently marked location and other related contextual data. And after gathering this real-time data, the mediator can stockpile it all, removing any identifying information and providing relevant information to Bing, producing automated, accurate and focused results.

The patent notes:

The disclosed architecture comprises a mediation component (e.g., an API (application program interface) as part of the operating system (OS)) that identifies engaged applications—applications the user is interacting with for task completion (in contrast to dormant applications—applications the user is not interacting with for task completion), and gathers and actively monitors information from the engaged applications (e.g., text displayed directly to the user, text embedded in photos, fingerprint of songs, etc.) to infer the working context of a user. The inferred context can then be handed over to one of the applications, such as a browser (the inferred context in a form which does not cross the privacy barrier) to provide improved ranking for the suggested queries through the preferred search provider. Since the context is inferred into concepts, no PII (personally-identifiable information) is communicated without user consent—only very high-level contextual concepts are provided to the search engines.
The architecture enables the capture of signals (e.g., plain text displayed to the user, text recognized from images, audio from a currently playing song, and so on), and clusters these signals into contextual concepts. These signals are high-level data (e.g., words) that help identify what the user is doing. This act of capturing signals is temporal, in that it can be constantly changing (e.g., similar to running average of contextual concepts). The signals can be continuously changing based on what the user is doing at time T (and what the user did from T-10 up to time T).
When using the browser application as the application that uses the captured signals, the browser broadcasts and receives (e.g., continuously, periodically, on-demand, etc.) with the mediation component through a mediation API of the mediation component to fetch the latest contextual concepts.
When the user eventually interacts with, or is anticipated to interact with, the browser (as may be computed as occurring frequently and/or based on a history of sequential user actions that results in the user interacting with the browser next), the contextual concepts are sent to the search provider together with the query prefix. The search engine (e.g., Bing™ and Cortana™ (an intelligent personal digital speech recognition assistant) by Microsoft Corporation) uses contextual rankers to adjust the default ranking of the default suggested queries to produce more relevant suggested queries for the point in time. The operating system, comprising the function of mediation component, tracks all textual data displayed to the user by any application, and then performs clustering to determine the user intent (contextually).
The inferred user intent sent as a signal to search providers to improve ranking of query suggestions, enables a corresponding improvement in user experience as the query suggestions are more relevant to what the user is actually trying to achieve. The architecture is not restricted to text, but can utilize recognized text in displayed photos as well as the geo-location information (e.g., global positioning system (GPS)) provided as part of the photo metadata. Similarly, another signal can be the audio fingerprint of a currently playing song.
As indicated, query disambiguation is resolved due to the contextual and shared cache which can be utilized by various applications to improve search relevance, privacy is maintained since only a minimally sufficient amount of information is sent from one application to the another application, and the inferred user context can be shared across applications, components, and devices.
The mediation component can be part of the OS, and/or a separate module or component in communication with the OS, for example. As part of the OS, the mediation component identifies engaged non-OS applications on the device and, gathers and actively monitors information from the engaged applications to infer the working context of the user. The inferred context can then be passed to one of the applications, such as the browser in a secure way to provide improved ranking for the suggested queries through the preferred search provider.


Of course, the major concern for users is the threat of compromised information, something no amount of assurance from Microsoft’s can relieve. The idea of the patent is somewhat similar to Google’s Now on Tap or Screen Search, a tool that scrapes the working screen for contextual information and launches a Google search in response — though the latest idea is far more autonomous.

The company says it could introduce this Mediator as either a built-in feature or as an optional module that can be installed to Windows 10. If it’s the latter case, then this platform could revolutionize automated searches and potentially be a powerful tool for contextually aware computing. But then again if a built-in feature is introduced, the OS would run obsolete from a personal level and most users would be looking for a way out of the functionality.


This article covers:Topics: