mmx metadata framework
...the DNA of your data
MMX metadata framework is a lightweight implementation of OMG Metadata Object Facility built on relational database technology. MMX framework
is based on three general concepts:
Metamodel | MMX Metamodel provides a storage mechanism for various knowledge models. The data model underlying the metadata framework is more abstract in nature than metadata models in general. The model consists of only a few abstract entities... see more.
Access layer | Object oriented methods can be exploited using inheritance to derive the whole data access layer from a small set of primitives created in SQL. MMX Metadata Framework provides several diverse methods of data access to fulfill different requirements... see more.
Generic transformation | A large part of relationships between different objects in metadata model are too complex to be described through simple static relations. Instead, universal data transformation concept is put to use enabling definition of transformations, mappings and transitions of any complexity... see more.

XDTL: Handling JSON and XML, Part 1

March 19, 2012 10:39 by mmx

Semi-structured data has been increasingly taking over the role of a universal container for data exchange. JSON, XML and other formats are dominating the area that once used to belong to fixed-format text files and CSV files. However, while CSV file import and export is a no-brainer and a standard part of any data integration tool, JSON/XML could pose a huge problem. As the structure of a JSON/XML document is dynamic and sometimes quite complex, traditional approaches for data transformation used by ETL tools for structured data often fail here.

XDTL takes a different approach. As always, instead of trying to provide means to handle any conceivable semi-structured data of arbitrary complexity with extremely sophisticated toolset XDTL focuses on a handful of most important and most frequent scenarios with simple and lightweight tools. Three different scenarios are considered:

1. The documents are small but with complex structure. The structure is known beforehand and is more or less static. This scenario mostly applies to a feed from a JSON/XML API that are popular in today's web environment and is applicable for creation of a connector (adapter) to a specific web application.

2. XML data is medium/large but with simple hierarchical structure, usually no more than 2-3 levels deep. The structure could be known beforehand or not. This scenario fits for medium size XML files where XML is used in place of more traditional text (CSV) format.

3. Very big document, probably with complex structure that never changes. This scenario would most probably require an XDTL extension (a library task written in XDTL). XDTL provides means for creation of such an extension that could take advantage of a variety of different facilities, eg. XSLT transformations, database platform-specific bulk load utilities and additional scripting, encapsulating and hiding all the complexity from a user. 

To handle the first scenario, xdtl:decode command is introduced. xdtl:decode takes an URI (referencing a data feed) or a document name as the source and decodes it into internal Java structures (Array, HashMap) that then could be easily shredded with for loops and .get('<name>') methods and either converted into INSERT statements or a series of flat files following the structure of data.

Here's a simple code fragment of decoding JSON: 

<xdtl:decode source="https://gdata.youtube.com/feeds/api/standardfeeds/most_viewed?alt=json" target="root" type="JSON"/>
<xdtl:log msg="${root.get('feed').get('title').get('$t')}"/>
<xdtl:log msg="${root.get('feed').get('updated').get('$t')}"/>
<xdtl:for item="entry" rowset="${root.get('feed').get('entry')}">
    <xdtl:log msg="${entry.get('id').get('$t')}"/>
    <xdtl:log msg="${entry.get('title').get('$t')}"/>
</xdtl:for>

That'll do it. One xdtl:decode command and a couple of nested loops will scrape the relevant data fields. The rest is up to the designer. There's no need to create XPath patterns or XSLT transformations. The same example could be elaborated to decode XML data:

<xdtl:decode source="https://gdata.youtube.com/feeds/api/standardfeeds/most_viewed" target="feed" type="XML"/>
<xdtl:log msg="${feed.get('title').get('textvalue')}"/>
<xdtl:log msg="${feed.get('updated')}"/>
<xdtl:for item="entry" rowset="${feed.get('entry')}">
    <xdtl:log msg="${entry.get('id')}"/>
    <xdtl:log msg="${entry.get('title').get('textvalue')}"/>
</xdtl:for>

Internally, XML is represented as Java Array and HashMap objects. XML element values are converted to "textvalue" properties.

Finally, here's a more complete example shredding the same YouTube feed into INSERT statements. Four tables (FEED, LINK, ENTRY and CATEGORY) are being filled with data, complete with parent-child relationships foreign keys. 

<xdtl:decode source="https://gdata.youtube.com/feeds/api/standardfeeds/most_viewed" target="feed" type="XML"/>

<xdtl:query source="INSERT INTO FEED VALUES ('${feed.get('id')}', '${feed.get('updated')}', '${feed.get('title').get('textvalue')}', '${feed.get('author').get('name')}');"/>

<xdtl:for item="link" rowset="${feed.get('link')}">

    <xdtl:query source="INSERT INTO LINK VALUES ('${feed.get('id')}', '${link.get('type')}', '${link.get('href')}');"/> 

</xdtl:for> 

<xdtl:for item="entry" rowset="${feed.get('entry')}"> 

    <xdtl:query source="INSERT INTO ENTRY VALUES ('${feed.get('id')}', '${entry.get('id')}', '${entry.get('published')}', '${entry.get('title').get('textvalue')}');"/> 

    <xdtl:for item="category" rowset="${entry.get('category')}"> 

        <xdtl:query source="INSERT INTO CATEGORY ('${entry.get('id')}', '${category.get('scheme')}', '${category.get('term')}');"/> 

    </xdtl:for> 

</xdtl:for>

Actually there's nothing to stop the script from shredding every single data field out of the XML feed with this method, and it only takes a couple of more xdtl:for loops. And last but definitely not least, the contents of the data could serve as sources for successive calls to other APIs with xdtl:decode within a loop, enabling recursive traversal of several related APIs.



Metadata materialization, part 5: Building an automated materialization engine

December 1, 2011 10:00 by Viktor Varlamov

In the last part of data materialization series, we take a look at building an automated process for materialization of all (or part) of metadata stored in MMX.

The proposed automated process in a cycle over object types that utilizes results from steps 2, 3 and 4.

But definition, in MMX only instances of non-abstract classes can exist, so, when looking for objects types to create views for we should exclude those.

SELECT 
	object_type_cd, 
	object_type_nm
FROM
	MD_OBJECT_TYPE
WHERE
	abstract_ind = 0

Additionally, we would like to exclude from materialization process all enumeration type object type. Identifying an enumeration object type is trickier so let's agree on the following definition of enumeration: enumeration object type does not have properties nor relations types and descends directly from Metamodel types class and it does not have any child class types, i.e.

...
AND object_type_cd NOT IN (
	SELECT 
		thisType.object_type_cd
	FROM 
		MD_OBJECT_TYPE thisType
	JOIN
		MD_OBJECT_TYPE parentType 
		ON parentType.object_type_cd = thisType.parent_object_type_cd
		AND parentType.parent_object_type_cd = mmxmd.object_type_code('MMXFramework', 'Metamodel')
	LEFT OUTER JOIN
			(SELECT 
				object_type_cd, 
				count(property_type_cd) prop_count
			FROM
				MD_PROPERTY_TYPE
			GROUP BY object_type_cd) propTypes
		ON propTypes.object_type_cd = thisType.object_type_cd
		AND propTypes.prop_count = 0
	LEFT OUTER JOIN
			(SELECT 
				object_type_cd, 
				count(relation_type_cd) rel_count
			FROM
				MD_RELATION_TYPE
			GROUP BY object_type_cd) fromRelTypes
		ON fromRelTypes.object_type_cd = thisType.object_type_cd
		AND fromRelTypes.rel_count = 0
	LEFT OUTER JOIN
			(SELECT 
				related_object_type_cd object_type_cd, 
				count(relation_type_cd) rel_count
			FROM
				MD_RELATION_TYPE
			GROUP BY related_object_type_cd) toRelTypes
		ON toRelTypes.object_type_cd = thisType.object_type_cd
		AND toRelTypes.rel_count = 0 
	LEFT OUTER JOIN 
			(SELECT 
				parent_object_type_cd object_type_cd,
				count(object_type_cd) child_count
			FROM 
				MD_OBJECT_TYPE
			GROUP BY parent_object_type_cd) childTypes
 		ON childTypes.object_type_cd = thisType.object_type_cd
		AND childTypes.child_count = 0 
	WHERE 
		  propTypes.object_type_cd IS NULL
		  AND fromRelTypes.object_type_cd IS NULL
		  AND toRelTypes.object_type_cd IS NULL
		AND childTypes.object_type_cd IS NULL)

Complete list of articles in the materialization series:

Part 1: Materialization choices
Part 2: Materializing objects and properties
Part 3: Materializing objects and properties: what was left behind
Part 4: Materializing many-to-many relations
Part 5: Builing an automated materialization engine



Tips and tricks: Rowset for xdtl:for

November 30, 2011 09:38 by Viktor Varlamov

One of the main usage for xdtl:for is looping thought a rowset that came from xdtl:fetch. Observe

<xdtl:fetch source="SELECT object_id, object_nm FROM MD_OBJECT WHERE object_type_cd = 1" rowset="metamodels" connection="$cn" />
<xdtl:for item="metamodel" rowset="metamodels">
	<xdtl:log msg="Found metamodel named ${metamodel[1]} with id ${metamodel[0]}" />
</xdtl:for>
 

Before we get to the point, one little thing. Check if returned rowset is not empty:

<xdtl:if expr="${metamodel.isEmpty()">
	<xdtl:error msg="No metamodels found" />
</xdtl:if>
 

Often the rowset may be defined in a input parameter as a coma-separated list. In that case you still can loop through it by using Java split function. Observe, that you have to convert the splited array into list of arrays:

<xdtl:send source="Metamodel1,Metamodel2,Metamodel3" target="metamodel_list" overwrite="1" />
<xdtl:for rowset="${java.util.Arrays.asList(metamodel_list.split(/,/))}">
	<xdtl:log msg="Found metamodel named ${metamodel[0]}" />
</xdtl:for>