go back

Concept Identification for Complex Data Sets

Felix Lanfermann, "Concept Identification for Complex Data Sets", Bielefeld University, 2023.


Large and complex data sets play an essential role in many engineering and computer science applications. Revealing structures within data sets, such as groups of similar data samples or correlations between feature values, is often desirable. But generating such insights is far from trivial. The field of concept identification targets to automatically find groups of data samples in large and complex data sets which share common properties. Such concepts provide useful knowledge, for example, in the engineering design domain. They open the route for further refinement of specific design variants that exhibit promising characteristic features. Moreover, for multi-criteria decision making problems, concepts can uncover general proximity relations between data samples, and support a decision-maker by illuminating trade-off relations among several objectives. However, the definition and identification of concepts in complex data sets is an open and difficult scientific question. It is addressed within this work by presenting an approach to define and assess meaningful and consistent concepts for any type of data set. A novel concept quality metric is proposed, which assigns an objective numeric value to a given definition of concepts. It incorporates several important aspects, such as the overlap between concepts and the consistency of the associated samples across multiple partitions of the full feature set. Additionally, user preferences, given as samples of particular interest, or a desired overall extent of the concepts, can be specified and integrated into the assessment. A concept identification process is defined by employing a numerical optimization procedure, which maximizes the concept quality measure, thereby identifying an optimal distribution of concepts. In a series of application examples, the usefulness of the metric is demonstrated by illustrating how the developed method leads to intuitively reasonable concept distributions for various complex data sets based on applications from engineering design, dynamical systems, controller configuration, and energy management.

Download Bibtex file Per Mail Request