Data modeling

Data modeling refers to methods in computer science for the formal mapping of relevant in a defined context objects by their attributes and relationships. The main objective is the clear definition and specification of the managed objects in an information system, its required for informational purposes only attributes and the relationships between information objects in order to obtain such an overview of the data view of the information system can. (see Ferstl / Sinz 2006, p 131).

Results here are data models that continuously, ultimately, several modeling stages to operational databases and databases.

Data modeling can also be used outside of projects for application development, for example, the data of a particular division, a department, a business process, etc. to take ( up to the entire company ), to document, describe their relationships and / or establish uniform terms.

Data models have typically much longer life than the functions and processes and thus software. The rule is: "Data is stable - functions are not" ("Data are stable functions are not "). Data, for example, continue to be used when software is replaced. Add new functions (or functional extensions ) existing ( and modeled ) data can be additionally required. Therefore is another principle: "Data is public knowledge, " ie: you should ( within a defined data responsibility, such as company- internal) basically the applications are available that they need, so do not (at least not exclusively ) of a particular IT application ' belong '.

Method

The data modeling, as an essential part of the discipline of software development proceeds through different phases of the project. The activities are created procedurally, that is, there is in each case goals / purposes, activities and results, building on each other, carry on intermediate to ultimately final results. In terms of specific milestones in the project arise mainly comprises the following model variants:

  • Conceptual database schema: Starting from the observation of a section of the real world, the relevant properties with all the relevant features and the relevant relationships between them are collected, analyzed and formulated graphically and textually. The basis for this guidance or statements on the given task ( = context), as specified, if necessary by discussion with the clients.
  • Logical Database Schema: The conceptual database schema is mapped to a logical database schema. Here, the model is extended with data technical data (eg field formats, identifying keywords, etc.). The logical database schema obeys the rules of a given by the DBMS to use structure, such as the relational data model in which all data is stored in tables.
  • Physical database schema: To implement the data model with a specific database system ( DBMS) all information on the syntax of the DBMS must be formulated for database generation. In part, this is automatically or semi-automatically possible with the use of generators.

With these three levels of the model and the procedure to only one basic approach is outlined. In detail this approach, the (intermediate) results and also the names of the models of the frequently used company-specific process models and of the used modeling methodology and software to be determined. Examples:

  • When using the later DBMS as a modeling tool, the model boundaries are blurred; the models evolve gradually to the final database.
  • For non -operated under a DBMS data sets is ( almost as a database schema replacement) only a 'Copy Route ' created, integrated with the data structure definitions in programs and thus can be used.

In data modeling, data is not generally included, belonging to the technical and substantive purpose of the systems, but not those who are in a narrower sense to the software, such as configuration data, parameters, data, etc. The latter are, as a prerequisite for the technical operation, directly installed in suitable data management forms.

Activities for each data model level ( examples):

To illustrate the procedure for data modeling examples of some activities are mentioned below, which may be within the respective level priorities. The samples are placed on the modeling with the entity-relationship method and the use of relational databases.

For conceptual database schema ::

  • Identifying the relevant information requirements (attributes )
  • Attached: Identify entity types and relationship types
  • Assigning the attributes to entity types
  • Set of possible attribute values ​​, suggestions for identifying attributes
  • Determining the Beziehungskardinalität
  • Sector Describe the entity and relationship types and attributes

For Logical database schema:

  • Methodical Check the professionally modeled approaches (eg by normalization)
  • In this: making new entity types, eg by specialization / generalization
  • Decision: Under what data management system (DBMS ... ), the data managed?
  • Transferring the ER model into a relational model
  • Setting the identifying key
  • Specifications for the technical implementation of relations: foreign key relationship tables
  • Setting advanced options for direct access ( secondary key )
  • Specifications for referential integrity
  • Extending the database model in the context of history and version management, multitenancy, etc.
  • Complete the model to lookup tables, parameters, tables, etc.

For physical database schema:

  • Setting optimization options for data accesses (eg by index definitions)
  • Formulate the scripts / commands to setup and configure the database ( in the syntax of the DBMS )
  • Definitions for data backup

Methods

There are, inter alia, the following data modeling methods, some of which are combined:

  • Bottom Up: collection of individual attributes, identification of potential keys, grouping it into object types, building relationships ( special form: Canonical synthesis)
  • Top Down: Detection of object types, building relationships, recognizing elementary attributes
  • Generalization and specialization of object types for the purposes of inheritance
  • Re-engineering of existing schemes
  • Putting up tables as relational model and normalization
  • Analysis of existing lists, expenses, reports, etc.

The result of data modeling, data models, which are present in the form of the entity- relationship model ( ERM) - and ultimately operational databases. An ERM consists of an Entity - Relationship Diagram (ERD ), for example, according to UML or IDEF1X, and a textual description of the model and its components.

Design Patterns: Play As in other design processes of computer science in the data modeling design patterns a major role, which are present in a number of subject areas. These include historiography, multilingual, multi-tenancy, but also sub-models such as addresses, organizational structures, roles and rights structures etc. Also prefabricated whole data models, such as for the financial sector, can serve as a design source. The most common patterns are listed in Fowler, Hay and Silverston.

Metamodeling: An important area for the application of design patterns is the meta-modeling. Moriarty calls this modeling dynamic modeling. In a meta-model in contrast to the concrete data model and the data content is a relevant part of the data model.

Different terms for similar issues: In the practical use of data modeling is not always uniform terms are used. In part, this is due to methods based, ' historically grown ' partly in the respective organizations ( and not always methodologically correct ), some terms are from different modeling stages mixed. Examples are:

  • For model graphics: ER diagram, class diagram, data model, information structure, information map
  • For entities: Entity, object, information object, class, table, row
  • For relations: relation, foreign key
  • For attribute values: property, field, data field, attribute column.

As can be seen, the instance concepts ( entity relationship) are in some cases instead of type terms ( Entity ... ) or used very terms of the database implementation (Table ... ) used. Different terms are also used when participants from different companies or different departments ( department, programming) communicate. In the interest of efficient communication and to avoid misunderstandings should be taken to encourage the use of correct and consistent terminology.

Support by software tools

As all the processes to the software development is carried out using certain tools and data modeling. In the project practice very different approaches in this regard are observable, which are outlined in the following examples:

  • Only standard software for graphics ( for ERD 's) and word processing ( for the description of components) will be used. In practice, only ' free text ' is detected, possibly supported by sample forms; Quality assurance hardly be automated; not focus on the specific task; not recommended.
  • Simple special applications where graphic symbols and descriptions are related. Example: Double-click on the entity opens its description; Names are identical in graphics and texts; to ' foreign ' terms can be referenced via a link.
  • The application has a metamodel in which it is determined which information details can be collected / need. The tool checks the possible inputs and particular contexts. For example: One entity can only ' existing ' attributes are assigned.
  • Data Dictionary: The developed components are listed as ' data objects ' and can be used in multiple projects. The project only excerpts are referenced extensions / modifications / deletions are project-related possible etc.
  • Other performance components of DM tools, exemplified can be: Version concept, documentation and evaluation functions, multi-user and multi -project capability, multi-tenancy, authorization and security concept.

Particularly highly integrated, the following examples may apply:

  • Universal specification tool: The for the ' data ' modeled model contents are also used by the tools ( referenced ) that functional specifications are created. The data constructs are in it ( occurs in field XYZ formula ABC, CDE evaluation, ...) as a reference available.
  • "Active Data Dictionary ": The content model can be used not only in the project, but also in the final application - for example, to display field names, performing plausibility checks etc.

The degree of integration of the tools can therefore be very different. It largely determines the quality of the modeling processes, particularly their efficiency.

Examples

Examples of data models are approximately:

  • To create product, customer, order and invoice as ' object types ' ( entity ) in one or procured order processing system of a medium sized trading company from the point of view of distribution. The model of this reality cutout can be used to make the specification of the functional requirements of the system.
  • The metamodel of the thesaurus being used in a research area, ie the specific terminology with their synonyms and the lower and upper terms and related terms as a reference for researchers working in this area. For the representation of the resulting data model, for example, a topic map can be used. The meta model for this thesaurus can be used to create a database (possibly including IT application ) for the acquisition of those concepts.
  • The semantic data model for a project management application for order management - as shown in graph 1.
  • The database schema as a graphic from the implementation tool MS Access for the same project management application - Chart 2 with implementation- technical extensions or variations from the semantic model. A database model as an intermediate was not created separately here.

It is clear that the relevance of the reality of the cut is determined by the particular context and the specific purpose.

220046
de