mmx metadata framework
...the DNA of your data
MMX metadata framework is a lightweight implementation of OMG Metadata Object Facility built on relational database technology. MMX framework
is based on three general concepts:
Metamodel | MMX Metamodel provides a storage mechanism for various knowledge models. The data model underlying the metadata framework is more abstract in nature than metadata models in general. The model consists of only a few abstract entities... see more.
Access layer | Object oriented methods can be exploited using inheritance to derive the whole data access layer from a small set of primitives created in SQL. MMX Metadata Framework provides several diverse methods of data access to fulfill different requirements... see more.
Generic transformation | A large part of relationships between different objects in metadata model are too complex to be described through simple static relations. Instead, universal data transformation concept is put to use enabling definition of transformations, mappings and transitions of any complexity... see more.

The X Is For eXtensibility

September 11, 2011 19:24 by mmx

XDTL stands for eXtensible Data Transformation Language. Extensibility here means that new language elements can be easily added without having to make changes to XML schema defining the core XDTL language. These extensions can, for example, be coded in XDTL and stored as XDTL packages with task names identifying the extension elements. XDTL Runtime expects to find the extension element libraries in directories listed in extensions.path parameter in xdtlrt.xml configuration file: the pathlist is scanned sequentially until a task with a matching name is found. 

During package execution an extension is provided with a full copy of the current context. An extension gets access to every variable value that is present in the calling context, as well as its attribute values that get converted into variables with names of the attributes. From the extension's point of view all those values are 'read-write', but only those passed as variables retain their values after the extension element finishes. So, considering passing values to an extension, variables can be seen as 'globals' that return values and extension element attributes as 'locals' that get discarded.

XDTL syntax definition (XML schema) includes an any element that allows XDTL language to be extended with elements not specified directly in the schema. any element in XDTL is defined as

<xs:any namespace="##other" processContents="lax"/>

##other means that only elements from namespace other than the namespace of the parent element are allowed. In other words, when parser sees an unknown element it will not complain but assume that it could be defined in some other schema. This prevents ambiguity in XML Schema (Unique Particle Attribution). Setting processControl attribute of an any element to "lax" states that if that 'other' schema cannot be obtained, parser will not generate an error.

 

So how does this work? We assume that our main script is referencing an external XML schema, elements of which are qualified with prefix 'ext':

xmlns:ext="http://xdtl.org/xdtl-ext"

In this external schema a single element, "show" with attribute "text" is defined. Here are some examples of what works and what doesn't.

<ext:show text="sometext"/> works, as the external namespace with element "show" is referenced by the prefix 'ext'.

<show xmlns="http://xdtl.org/xdtl-ext" text="sometext"/> also works, as the namespace reference is 'embedded' in the "show" element.

<show text="sometext"/> does not validate, as the parser looks for element "show" in the current schema (error message Invalid content was found starting with element 'show' is produced).

<ext:show nottext="sometext"/> does not validate either (Attribute 'nottext' is not allowed to appear in element 'ext:show').

<ext:notshow text="sometext"/> validates but still does not work! As the processContents attribute of any element is "lax", although the element is not found the parser ignores this. However, the XDTL Runtime complains as it cannot find element definition in extension pathlist.

 

What if we would want to use extensions without XML schema accompanying it? For that we remove the reference to the external schema from the script header and run the examples once again.

<ext:show text="sometext"/> would not validate any more as the prefix 'ext' is not defined. The same applies to all the other examples with prefix in front of the extension element.

<show text="sometext"/> would not validate either as the parser looks for extension element in the current schema.

<show xmlns="http://xdtl.org/xdtl-ext" text="sometext"/>, however, validates and works! Although the parser cannot find the schema it does not complain due to "lax" processContents attribute. As long as XDTL Runtime is able to find the library package containing the extension in the pathlist everything is fine, otherwise it would give Command 'Extension' failed error.

 

So here's the summary. Extended elements (commands) can be well-defined (having their syntax definitions in form of an XML schema) or undefined (just the package, no XML schema), just as a transformation designer sees fit. In the former case, extended elements will be validated exactly as the core language elements would, in the latter case they will pass without validation. If an undefined and non-validated extension element is executed and does not match its invocation a run-time error would be generated.

 



XDTL (eXtensible Data Transformation Language)

April 12, 2009 21:30 by marx

Traditional ETL (Extract Transform Load) tools broadly used in Data Warehouse environments tend to have two common deficiencies:

- emphasis on graphical user interface (and lack of a more efficient code interface) makes the design process slow and inflexible;

- dedicated ETL server generally means one extra hop for the data being transferred, which might be unacceptable considering today's data loads.

Enter XDTL (eXtensible Data Transformation Language). XDTL is an XML based descriptional language designed for specifying data transformations from one database/storage to another. XDTL syntax is defined in an XML Schema document (http://xdtl.org/xdtl). XML Schema of XDTL has semantic annotations linking it to XDTL ontology model.

XDTL documents are interpreted by an XDTL Runtime Engine. XDTL/XDTL Runtime Engine is built not from the perspective of a slick IDE or a cool engine, but an efficient language for describing the data transformations. The goal is to produce a lightweight ETL development/runtime environment that would handle most of the common requirements with better efficiency than traditional jack-of-all-trades tools. XDTL Runtime Engine is currently under development for both .NET and Linux environments. XDTL language is free to use for anyone.

XDTL documents are stored in well-formed XML files that can be validated with XDTL XML Schema. Package is the primary unit of execution that can be addressed by the runtime engine. A single XDTL document can contain several packages. Every package has a unique identifier in the form of a URI. There are also special non-executable packages (libraries) that serve as containers of tasks callable by other packages. A package contains an arbitrary number of Variables, an unordered collection of Connections and an ordered collection of Tasks. 

Variables are name-value pairs mostly used for parameterization and are accessible by transformations. Connections are used to define data sources and targets used in transformations. Connections can refer to database resources (tables, views, result sets), text files in various formats (CSV, fixed format, Excel files) or Internet resources in tabular format.

Tasks are the smallest units of execution that have a unique identifier. A task is an ordered collection of Transformations that move, create or change data. There is at least one transformation in a task. Tasks can be parameterized in case one or several of it's transformations have parameters; in that case all the parameters should have default values defined as package variables.

What sets XDTL apart from traditional ETL tools? 
 
- while ETL tools in general focus on the graphical IDE and entry-level user, the needs of a professional user are not addressed as he/she has to struggle with inefficient workflow. XDTL relies on XML as development vehicle making it easy to generate the data transformation documents automatically or with XML tools of choice.

- as data amounts grow, the paradigm shifts from ETL to ELT, where bulk of the transformations take place inside the (target) database. Therefore most of the fancy features provided by heavyweight ETL tools are rarely or never used and the main workforce is SQL. However, there is very little to boost the productivity of SQL generation and reuse. XDTL addresses this with metadata-based mappings and transformations served from metadata repository, and by the use of transformation templates instead of SQL generation, capturing the typical scenarios in task libraries for easy reuse.

-  most of the heavyweight tools try to address every single conceivable problem which turns solving the trivial tasks obscure and too complex. They also aim to provide support for every single database product even if the chances to ever encounter most of them are almost zero. XDTL focuses on the most frequent scenarios and mainstream brands and put the emphasize on productivity and efficiency. 

- XDTL takes advantage of the general-purpose metadata repository of MMX Framework targeting a broad range of metadata-related activities and not locking the user into an ETL-specific and ETL-only repository from <insert your ETL tool vendor>.
 

 



Metamodel-based validation of models

December 30, 2008 13:01 by marx
When creating a metamodel instance (ie. model), the structure of the metamodel can be used to automatically validate the structure of the corresponding model. Basically every class, association and attribute value can be interpreted as a constraint (validation rule) to enforce certain properties and characteristics of the model (metadata). As metamodels are often seen as 'defining languages for model descriptions' we might consider these rules a syntax check for a model expressed in this language.  
 
Constraints (validation rules) can be materialized and enforced during metadata scanning/loading process, during a dedicated validation maintenence task, on demand etc. In MMX framework the rules are implemented on database level as a set of data constraints of the metadata repository and form a protective layer transaparent to a user or an application built on the framework. Only 'structural' properties of a metamodel have been implemented - 'semantic' properties (homonyms, synonyms, reflexivity, transitivity etc.) and their use as validation rules is a separate (and much more complex) topic not covered yet. The rules for model validation implemented in MMX (and how they are enforced through constraints) are as follows: 
 
:{M1} objects inherit their type codes from corresponding classes in {M2} metamodel(s). Only concrete classes can have corresponding objects.
 
object.Type *partof(objectClass.Type) & objectClass.IsAbstractClass = False
relation.Type *partof(relationClass.Type)
property.Type *partof(propertyClass.Type)
 
:{M2} class names are unique within a namespace, ie. {M2} metamodel and are never empty.

objectClass.Name *isunique(objectClass.Name) & objectClass.Name <> nil

:{M1} parent-child relations between objects are derived from designated associations between their superclasses in {M2} metamodel(s). 
 
object.parent.Type *partof(*tree(relationClass.relatedObject.Type)) & relationClass.IsTaxonomy = True

:{M1} related objects inherit their type codes from {M2} classes and/or their superclasses related through {M2} associations and/or {M2} attributes.
 
relation.object.Type *partof(*tree(relationClass.object.Type))
relation.relatedObject.Type *partof(*tree(relationClass.relatedObject.Type))
property.object.Type *partof(*tree(propertyClass.object.Type))
property.domain.Type *partof(*tree(propertyClass.domain.Type))

:{M1} linear properties as well as significiant elements of hierarchical properties with empty (null) value inherit the default value from the corresponding {M2} attributes.

property.Value *coalesce(property.Value, propertyClass.defaultValue)

:The number of {M1} objects participating in a relation cannot exceed or subcede the one expressed by the multiplicity property of the corresponding {M2} association on both ends.

*numberof(relation.object) *ge(relationClass.multiplicity.minValue)
*numberof(relation.relatedObject) *ge(relationClass.multiplicity.minValue)
*numberof(relation.object) *le(relationClass.multiplicity.maxValue)
*numberof(relation.relatedObject) *le(relationClass.multiplicity.maxValue)

:When the 'whole' object of a {M1} relation (descending from a {M2} association of type 'aggregation') is deleted, the relation itself is also deleted. When the 'whole' object of a {M1} relation (descending from a {M2} association of type 'composition') is deleted, both the relation and the 'parts' object are deleted.

*isdeleted(relation.object) & relationClass.Type = 'AGGR' -> relation := nil
*isdeleted(relation.object) & relationClass.Type = 'COMP' -> relation := nil
*isdeleted(relation.object) & relationClass.Type = 'COMP' -> relation.relatedObject := nil

The implementations are defined in an intuitive semi-formal notation. The operators *isunique, *partof, *tree, *isdeleted, *numberof, *ge and *le denote abstract pseudo-operations of uniqueness, being part of, tree of ancestors, deletion, count, greater-than-or-equal and less-than-or-equal.