 |
 |
|
 |
 |
| Our research programs presently focus on advanced business intelligence and information management. The areas of interest include parallel data warehousing, data mining and search, scalable analytics, Web content discovery, multimedia information processing, Grid computing, and industry solutions. The research programs are targeted at new software and service opportunities that leverage HP¡¯s core strengths; they are often conducted in close collaboration with researchers at Chinese universities and are motivated by applications in China. |
|
 |
 |
 |
|
 |
 |
|
 |
 |
 |
|
 |
 |
| It is well known that there are at least tens of billions of web pages reachable from the public Internet, and the deep web contains even more contents. To find useful content in this sea of information, people usually use a general purpose search engine like Google. Although such search engines have been very effective in finding relevant web pages, they do not give precise and complete answers for more specific questions regarding a particular domain.
We are exploring technologies for domain-specific web content discovery and management. A domain characterizes a certain content type. For example, the online course domain consists of all the course pages on the web, and the product information in a certain industry is another domain. If we want to know which universities in the US and China offer Database classes, a general purpose search engine usually does not give a good answer. Therefore, technologies that can collect and organize information in a specific domain are very useful.
We are engaged in research around focused crawling, classification, and metadata extraction. One example of our effort is targeted at the online courseware, and we partner with Peking University researchers in this effort. This effort enables us to build an online courseware portal called OfCourse for the higher education community in China and in other parts of the world. The technologies can be generalized to other target domains, such as entertainment information, financial service information, or competitive intelligence.
» Top |
|
 |
 |
 |
|
 |
 |
| Multimedia is an increasingly popular form of digital content. Large scale search and retrieval technologies for multimedia information is expected to gain importance in both consumer and enterprise sectors. Our research in multimedia information management aims at developing technologies for automating concept extraction and providing search and retrieval over a large quantity of multimedia information.
Presently we focus our effort on building a video data warehouse. We investigate multi-modality based mechanisms for content analysis, as well as mechanisms based on learning from user feedback. Parallel computing, parallel database system, and high-dimensional indexing are being explored to scale feature extraction, correlation, classification and annotation.
This effort is a collaboration with China National Lab of Intelligent Information System at Tsinghua University. It also represents one of the research projects presently under way at the Tsinghua-HP Multimedia Research Center.
» Top |
|
 |
 |
 |
|
 |
 |
| This program examines innovative parallel architectures and algorithms for managing and processing massive amounts of data, and for integrating information from a large number of sources. Combining parallel computing with large scale data management, we are examining technologies that will help scale compute-intensive applications to very large data sets, and to harness parallelism in achieving real-time responses. Instead of bringing data to where the programs execute in a parallel computing environment, we explore approaches that bring programs to where the data resides.
We collaborate with research partners who build simulation models or predictive models in scientific disciplines. One example of our effort is a project in the area of hydro-informatics, in partnership with China National Lab of Hydraulic Engineering at Tsinghua University. As scientists need to draw upon a very large set of temporal and spatial data to build predictive models for water resources around large river systems, we are exploring methods to partition the large data set among nodes in a computing cluster, and distribute computation among the nodes while optimizing for temporal and spatial dependencies inherent in the computation requirement. We study methhods to balance computation on-demand with pre-materialized and stored results.
We also examine emerging technologies that enable multiple sources of information to be integrated. Multiple sources of geo-spatial information, including satellite images, maps, etc., can be ¡°mashed up¡± with results of scientific models that predict environmental conditions, such as water levels in major rivers. We utilize emerging standards and new technologies to perform information integration as well as 3D visualization.
» Top |
|
 |
 |
 |
|
 |
 |
| Digital content management and preservation has been a key research effort at HP Labs. One result of the effort is the development of the DSpace system, an open source digital repository system initially developed by HP Labs in collaboration with MIT libraries. The DSpace software platform enables organizations to capture, store, index, preserve and distribute their digital assets. Over 110 organizations worldwide have used DSpace to build digital library systems since 2001.
At HP Labs China, we are examining the issue of federating multiple digital content management systems and presenting a common portal to search their contents. Federation allows the contents distributed in multiple organizations to be aggregated using a consistent vocabulary for centralized searching and browsing. We are developing technologies for building a large-scale, distributed digital content management infrastructure based on DSpace.
Our present effort targets university digital museums in China, a project initiated by the Chinese Ministry of Education. While each university is responsible for digitizing and preserving its own collection of museum objects, sharing of these contents across multiple universities requires federated content management . We are collaborating with Beihang University in an effort to build a federated China university digital museum. We have developed a federated DSpace, DM-DSpace, which can be set up as either the federated system (data center) or local museum using different configurations. DM-DSpace is also being leveraged in other China digital museum projects, such as China Digital Science and Technology Museum.
» Top |
|
 |
 |
 |
|
 |
 |
| An important function provided by the management system for Computation Grids is the ability of administrators and users to monitor the performance and status of jobs executing in the Grid. The heterogeneity, complexity, dynamism and scale of modern Grids present challenges to delivering this functionality. Monitoring data is often collected by a heterogeneous mix of monitoring tools with different access interfaces and managed by disparate organizations, while the tasks of a job are scheduled dynamically on one or more nodes. Traditional monitoring solutions provide few mechanisms for assimilating data about such a distributed job and its component tasks, while manual collection is error-prone, slow, fragile, and prevents using the data to drive other automation tasks.
We are developing monitoring solutions to address these issues. In particular we with collaborate several key Chinese universities responsible for ChinaGrid development. ChinaGrid, an initiative sponsored by the Chinese Ministry of Education, is one of the largest Grid computing projects in the world. Our solution works with the job schedulers in a Grid and existing data collectors to automatically track and monitor job execution. It automatically generates CIM models that describe the ChinaGrid infrastructure and a job's use of the infrastructure, which are then used to automatically establish a link between a job or its subcomponents and the monitoring data for it. The Web service standard is used to present the data consumer with a common interface to the data. Our solution scales well and effectively eliminates the barriers to accessing the monitoring data.
» Top |
|
 |
 |
|
 |
| |
 |
Worldwide sites |
 |
|
 |
 |