Extracting patient’s intentions from the Web

October 11, 2014NewslettersBy Eleni Kaldoudi-admin

Analysis of CARRE use cases and project aims shows that decision support would benefit from resolving two different types of patient intention: a) intention to search information on cardiorenal disease concepts; and b) intention to travel, particularly to places where environmental and dietary conditions may require adjustment to patients’ diet and physical activity. Thus the main goal of this aggregator is to use a patient’s web searches in order to extract her intentions.

The main parts of this aggregator are the Query Detector and the User Intention Extractor. Both of them are located in patient-side and particularly in the personal computer of the patient. In more details, the Query Detector constitutes a browser extension (e.g. Firefox add-on and Chrome extension) that is responsible to detect the user’s queries in the web search engines (e.g. Google, Bing and Yahoo). Then, the detected queries are forwarded for further processing to the User Intention Extractor. The User Intention Extractor is responsible to store (only) locally the incoming queries and categorize them in specific categories (e.g. traveling, health diseases, etc.) in order to extract the patient’s intentions.

In the categorization process, we apply a web query classification technique (Agrawal, Ritesh, Xiaofeng Yu, Irwin King, and Remi Zajac. “Enrichment and Reductionism: Two Approaches for Web Query Classification.” In Neural Information Processing, pp. 148-157. Springer Berlin Heidelberg, 2011) that uses documents from the World Wide Web (WWW) to enrich target categories and further models the web query classification as a search problem. Additionally, we provide a mechanism that extracts extra features for some of the categories, such as a geotag of the “traveling” queries. At the end, only the relevant intensions to the CARRE system are uploaded to the patient’s private RDF. The communication between the User Intention Extractor and the Private RDF is encrypted (HTTPS) and an appropriate authentication mechanism (OAuth) it is used to identify the patients.

Current implementation status of the aggregator is as follows:

Query Detector (v0.2 Beta): It is fully implemented with JavaScript as a browser extension for the Firefox v34 and the Chrome v39.

Query Extractor: It is implemented and it can extract queries from the web search engines: Google, Bing and Yahoo.
Query Sender: It is implemented and it sends the queries in JSON format over the HTTP protocol to the User Intention Extractor.

User Intention Extractor (v0.1 Alpha): It is partially implemented with Java JDK v1.8 and the NetBeans IDE v8.0.2. This mechanism will run as a Java application that will start after the login of user.

Query Receiver: It is implemented and it stores the queries in the local database.

Local Storage of Queries: It is implemented and designed with SQLite v3.8.7 database.
Index of Categorized Documents: The implementation is in progress. The collection of documents is being achieved with Java by using the ClueWeb09_B dataset (50 Million English documents) and the creation of index is being accomplished with the Lemur Toolkit (Indri v5.7).
Query Categorizer: The implementation is in progress. The development is being achieved with Java by using the JNI interface of the Lemur Toolkit (Indri 5.7) that provides a powerful search engine.
Feature Extractor: It will be implemented. It will be developed with Java and probably with the help of Lemur Toolkit.
Patient Intention Sender: It will be implemented. It will be developed with Java and will authenticate the user with the OAuth protocol. The patient’s intentions will be sent to Private RDF with SPARQL language over the HTTPS protocol.

Author: George Drosatos (DUTH)

Date: 11 October 2014

About the author

[email protected]