mmx metadata framework
...the DNA of your data
MMX metadata framework is a lightweight implementation of OMG Metadata Object Facility built on relational database technology. MMX framework
is based on three general concepts:
Metamodel | MMX Metamodel provides a storage mechanism for various knowledge models. The data model underlying the metadata framework is more abstract in nature than metadata models in general. The model consists of only a few abstract entities... see more.
Access layer | Object oriented methods can be exploited using inheritance to derive the whole data access layer from a small set of primitives created in SQL. MMX Metadata Framework provides several diverse methods of data access to fulfill different requirements... see more.
Generic transformation | A large part of relationships between different objects in metadata model are too complex to be described through simple static relations. Instead, universal data transformation concept is put to use enabling definition of transformations, mappings and transitions of any complexity... see more.

Access Control Implementation in MMX Framework

June 30, 2009 10:55 by marx

Access Control in MMX Framework is impemented based on principles of Role Based Access Control as defined in standard specification ANSI INCITS 359-2004, Role Based Access Control ( To be more precise, the implementation is based on Hierarchical RBAC, adding support for Role hierarchies to Core RBAC component. Details about RBAC in general and in detail can be found here:

Quoting from the abovementioned document: "Core RBAC includes sets of five basic data elements called users (USERS), roles (ROLES), objects (OBS), operations (OPS), and permissions (PRMS). The RBAC model as a whole is fundamentally defined in terms of individual users being assigned to roles and permissions being assigned to roles. As such, a role is a means for naming many-to-many relationships among individual users and permissions.

A user is defined as a human being. Although the concept of a user can be extended to include machines, networks, or intelligent autonomous agents, the definition is limited to a person in this document for simplicity reasons. A role is a job function within the context of an organization with some associated semantics regarding the authority and responsibility conferred on the user assigned to the role. Permission is an approval to perform an operation on one or more RBAC protected objects. An operation is an executable image of a program, which upon invocation executes some function for the user. The types of operations and objects that RBAC controls are dependent on the type of system in which it will be implemented. 

Consistent with earlier models of access control an object is an entity that contains or receives information. The set of objects covered by RBAC includes all of the objects listed in the permissions that are assigned to roles."


Implementation. MMX Framework offers perfect means for implementing RBAC classes and associations as part of Core MMX: RBAC Metamodel constitutes simply another metamodel on MMX M2 level. As it is often the case with RBAC implementations, the possible values of Permission Type are enumerated as ALLOW (+), DENY (-) and (optionally) NOT_KNOWN (?) so both 'restriction based' and 'permission based' access control (and even a mix of them) is possible.

It is assumed that authentication of users is typically not a part of a metadata application and an external Active Directory service is providing the verified identities (possibly with roles) so Users and Roles contain merely references to information in AD. Semantics of the Operations is not defined in RBAC (see the quote above) and is the role of an application to define and implement. Therefore it is up to an application designer to decide the level of abstractness of the operations he/she would like to see tracked by Access Control, and how to interpret those Operations in application code.     

Role hierarchies define an inheritance relation among roles. As stated in ANSI INCITS 359-2004, "The Hierarchical RBAC component adds relations for supporting role hierarchies. A hierarchy is mathematically a partial order defining a seniority relation between roles, whereby senor roles acquire the permissions of their juniors and junior roles acquire users of their seniors." In MMX terms, a role in the role 'tree' has all the permissions of the roles 'below', and all the users of the roles 'above' itself.
In addition to Role hierarchies, MMX Framework RBAC implementation treats Access Control Objects as hierarchies as well enabling an application to exploit the hierarchy management functionality that is part of the Framework. So it is sufficient to denote a root or subroot of a class hierarchy as an Access Control Object to have the whole hierarchy of classes assigned to a Permission or a Role. This enables an application to build an Acces Control List easily with full support from Metadata API, part of MMX Framework Access Layer. 
Access Control Object references a metamodel class (either in the role of the root of a hierarchy, or denoting a specific class directly) with a property (set of properties). Currently the list of properties for this purpose is (but not limited to) as follows:
- RootReference (root of a hierarchy is denoted by the class reference);
- RootModel/RootObjectType (root of a hierarchy is denoted by a combination of a metamodel name and a class name that uniquely identifies the root class);
- Reference (class reference identifies the class directly); 
- Model/ObjectType (combination of a metamodel name and a class name identify the class directly).
RootReference, RootMode, RootObjectType and Model have multiplicity of 1 while Reference and ObjectType properties can have an arbitrary number of instances, therefore a list of classes can be identified as a single Access Control Object.


XDTL (eXtensible Data Transformation Language)

April 12, 2009 21:30 by marx

Traditional ETL (Extract Transform Load) tools broadly used in Data Warehouse environments tend to have two common deficiencies:

- emphasis on graphical user interface (and lack of a more efficient code interface) makes the design process slow and inflexible;

- dedicated ETL server generally means one extra hop for the data being transferred, which might be unacceptable considering today's data loads.

Enter XDTL (eXtensible Data Transformation Language). XDTL is an XML based descriptional language designed for specifying data transformations from one database/storage to another. XDTL syntax is defined in an XML Schema document ( XML Schema of XDTL has semantic annotations linking it to XDTL ontology model.

XDTL documents are interpreted by an XDTL Runtime Engine. XDTL/XDTL Runtime Engine is built not from the perspective of a slick IDE or a cool engine, but an efficient language for describing the data transformations. The goal is to produce a lightweight ETL development/runtime environment that would handle most of the common requirements with better efficiency than traditional jack-of-all-trades tools. XDTL Runtime Engine is currently under development for both .NET and Linux environments. XDTL language is free to use for anyone.

XDTL documents are stored in well-formed XML files that can be validated with XDTL XML Schema. Package is the primary unit of execution that can be addressed by the runtime engine. A single XDTL document can contain several packages. Every package has a unique identifier in the form of a URI. There are also special non-executable packages (libraries) that serve as containers of tasks callable by other packages. A package contains an arbitrary number of Variables, an unordered collection of Connections and an ordered collection of Tasks. 

Variables are name-value pairs mostly used for parameterization and are accessible by transformations. Connections are used to define data sources and targets used in transformations. Connections can refer to database resources (tables, views, result sets), text files in various formats (CSV, fixed format, Excel files) or Internet resources in tabular format.

Tasks are the smallest units of execution that have a unique identifier. A task is an ordered collection of Transformations that move, create or change data. There is at least one transformation in a task. Tasks can be parameterized in case one or several of it's transformations have parameters; in that case all the parameters should have default values defined as package variables.

What sets XDTL apart from traditional ETL tools? 
- while ETL tools in general focus on the graphical IDE and entry-level user, the needs of a professional user are not addressed as he/she has to struggle with inefficient workflow. XDTL relies on XML as development vehicle making it easy to generate the data transformation documents automatically or with XML tools of choice.

- as data amounts grow, the paradigm shifts from ETL to ELT, where bulk of the transformations take place inside the (target) database. Therefore most of the fancy features provided by heavyweight ETL tools are rarely or never used and the main workforce is SQL. However, there is very little to boost the productivity of SQL generation and reuse. XDTL addresses this with metadata-based mappings and transformations served from metadata repository, and by the use of transformation templates instead of SQL generation, capturing the typical scenarios in task libraries for easy reuse.

-  most of the heavyweight tools try to address every single conceivable problem which turns solving the trivial tasks obscure and too complex. They also aim to provide support for every single database product even if the chances to ever encounter most of them are almost zero. XDTL focuses on the most frequent scenarios and mainstream brands and put the emphasize on productivity and efficiency. 

- XDTL takes advantage of the general-purpose metadata repository of MMX Framework targeting a broad range of metadata-related activities and not locking the user into an ETL-specific and ETL-only repository from <insert your ETL tool vendor>.


Trees and Hierarchies the MMX Way

March 30, 2009 22:34 by marx

Implementing trees and hierarchies in a relational database is an issue that has been puzzling many and has triggered numerous posts, articles and even some books on the topic. 

As stated by Joe Celko in Chapter 26, Trees [1]: "Unfortunately, SQL provides poor support for such data. It does not directly map hierarchical data into tables, because tables are based on sets rather than on graphs. SQL directly supports neither the retrieval of the raw data in a meaningful recursive or hierarchical fashion nor computation of recursively defined functions that commonly occur in these types of applications. <...> Since the nodes contain the data, we can add columns to represent the edges of a tree. This is usually done in one of two ways in SQL: a single table or two tables." The single table representation enables one-to-many relationships via self-references (parent-child) while more general, two table representation handles many-to-many relationships of arbitrary cardinality. Based on the principles of Meta-Object Facility (MOF), MMX implements both M1 (model) and M2 (metamodel) layers of abstraction. Two most important relationship types defined by UML, Generalization and Association, are realized.

Generalization is defined on M2 level and is implemented via SQL self-relationship mechanism. Each class defined in M2 must belong to one class hierarchy, and only single inheritance is allowed. In terms of semantic relationship types in Controlled Vocabularies [2], this is an 'isA' relationship. Associations (as well as aggregations and compositions) are realized as a relationship table (an associative or a 'join table') allowing any class to be related to any other class with an arbitrary number of associations of different type (with support for mandatory and multiplicity constraints). This implementation enables straightforward translation of metamodels expressed as UML class diagrams into equivalent representation as MMX M2 level class objects.

M1 level deals with instances of M2 classes and parent-child hierarchies here denote 'inclusion', 'broader-narrower' or structural relationships between objects ('partOf' relationship in Controlled Vocabularies world). UML Links are implemented as a many-to-many relationship table, with both parent-child and link relationships being inherited from associations defined on M2 level. This inheritance enables automatic validation of M1 models against M2 metamodels by defining general rules to reinforce the integrity of models based on the characteristics of respective metamodel elements.

('single table', parent-child, one-to-many)
('two tables', relationship table, many-to-many)
Class hierarchy ('isA'),
UML Generalization
UML Associations
Object hierarchies ('whole-part'),
UML Links
UML Links

There seems to be a huge controversy in data management community whether implementing hierarchies in SQL should employ recursion support built into modern database systems or not. While a technique employing manual traversal and management of tree structures is proposed by Joe Celko in [1], the book is 15 years old and meanwhile the world (and databases) have changed a bit. Recursion is now part of ANSI SQL-99 with most big players providing at least basic support for it, and in many cases arguable gain in performance without taking advantage of recursive processing makes way to the gain in ease and speed of application development with it.

MMX Framework encapsulates all the details of handling inheritance, traversing hierarchies, navigating linked object paths etc. in MMX Metadata API realized as a set of table functions (database functions that return table as the result) that can be easily mapped by Object-Relational Mappers [3]. The performance penalty paid for recursion that might be an issue in an enterprise scale DWH is not an issue here - after all, MMX Framework is designed for (and mostly used in) metadata management, where data amounts are not beyond comprehension. 

[1] Joe Celko's SQL For Smarties: Advanced SQL Programming, 1995.

[2] Zeng, Marcia Lei. Construction of Controlled Vocabularies, A Primer (based on Z39.19), 2005.

[3] Scott W. Ambler. Mapping Objects to Relational Databases, 2000.