go back

Understanding Concept Identification as Consistent Data Clustering Across Multiple Feature Spaces

Felix Lanfermann, Sebastian Schmitt, Patricia Wollstadt, "Understanding Concept Identification as Consistent Data Clustering Across Multiple Feature Spaces", IEEE International Conference on Data Mining Workshops (ICDMW), 2022.


Identifying meaningful concepts in large data sets can provide valuable insights into engineering design problems. Concept identification aims at identifying non-overlapping groups of design instances that are similar in a joint space of all features, but which are also similar when considering only subsets of features. These subsets usually comprise features that characterize a design with respect to one specific context, for example, constructive design parameters, performance values, or operation modes. It is desirable to evaluate the quality of design concepts by considering several of these feature subsets in isolation. In particular, meaningful concepts should not only identify dense, well separated groups of data instances, but also provide non-overlapping groups of data that persist when considering pre-defined feature subsets separately. In this work we extend the scope of the concept identification process beyond the engineering design domain and propose a viewpoint that concept identification can be regarded as a more general form of clustering. We apply a recently proposed concept identification algorithm to two synthetic data sets and illustrate the differences to classical clustering algorithms. In addition to established cluster evaluation metrics, we focus on the mutual information measure to assess the information gain provided to a decision maker by identifying consistent clusters across different subspaces. To support the novel understanding of concept identification, we consider a simulated data set from a decision-making problem in the energy management domain and show that the identified clusters are more interpretable than clusters found by common clustering algorithms and are thus more suitable to support a decision maker.

Download Bibtex file Per Mail Request