mmx metadata framework
...the DNA of your data
MMX metadata framework is a lightweight implementation of OMG Metadata Object Facility built on relational database technology. MMX framework
is based on three general concepts:
Metamodel | MMX Metamodel provides a storage mechanism for various knowledge models. The data model underlying the metadata framework is more abstract in nature than metadata models in general. The model consists of only a few abstract entities... see more.
Access layer | Object oriented methods can be exploited using inheritance to derive the whole data access layer from a small set of primitives created in SQL. MMX Metadata Framework provides several diverse methods of data access to fulfill different requirements... see more.
Generic transformation | A large part of relationships between different objects in metadata model are too complex to be described through simple static relations. Instead, universal data transformation concept is put to use enabling definition of transformations, mappings and transitions of any complexity... see more.

XDTL Runtime: Alive and Kicking

March 22, 2010 11:58 by marx

First, a quick recap. XDTL (http://xdtl.org) is an XML based descriptional language designed for specifying data transformations from one database/storage to another. XDTL syntax is defined in an XML Schema document. XDTL Runtime is a lightweight ETL runtime environment that efficiently and with zero overhead handles most of the typical ETL needs. XDTL Runtime can generate SQL or SAS scripts (or any other executable instructions for that matter) based on templates processed by a template engine.

Now, the 'news' part. XDTL Runtime Version 1.0 (XDTL RT 1.0) is finished and running live! The runtime is written in Java (Sun JRE 1.6 required) and uses Velocity (http://velocity.apache.org/) for template processing. So here's a short primer.

There are two individually addressable units of execution in XDTL: a package is a container containing tasks, both of them can be invoked by name. A task consists of steps denoting individual commands that are executed sequentially and cannot be addressed individually. Besides tasks, a package contains three collections: parameters, variables and connections. As in XSLT, $ denotes dereferencing: during execution, everything starting with $ is dereferenced and substituted with a value. In addition, everything between { and } is treated as a JavaScript expression and evaluated.   

There are file commands and database commands in XDTL. File commands usually launch operating system processes and are designed to handle files (move, compress, transfer etc.). File commands in XDTL RT 1.0 are:

get: downloads a file with a file transfer utility (ftp, scp, sftp)
put: uploads a file with a file transfer utility (ftp, scp, sftp)
unpack: unpacks a file with an archival utility (zip, arj, tar etc)
pack: pack a file with an archival utility (zip, arj, tar etc)
strip: cleanse a text file, eg. with a stream/regex utility (sed etc)
move: move files to another directory, usually for archival with timestamp
clear: remove everything related to a file/task from a working directory
log: adds a line to standard log output
exec: executes an arbitrary operating system command line (shell) command

Database commands control database operations:

read: transfers a text file into a database table, usually in staging area
write: transfer a database table into a text file
load: configured to load a file into a database table with a bulk load utility
dump: configured to dump a database table into a file with a bulk dump utility
transaction: wrapper for transaction processing
query: executes a database command (an SQL statement, a SPARQL query etc.)

Then come some control flow commands:

call: invokes another package or another task passing required parameters
if: adds basic conditional control

Finally, while the file and database commands constitute the backbone of XDTL, it's heart and soul are mappings and render commands:

mappings: define or retrieve metadata used to instantiate a procedure (SQL, SAS etc.)
render: merges a code template with corresponding mappings to produce executable SQL statement(s), SAS procedure(s) or any other form of executable code

Mappings and templates are fully decoupled and 'know nothing about each other': it's only during the rendering step that they meet each other to merge into an executable script or set of instructions. This enables a lot of novel opportunities: a specific set of mappings can be used with different templates for different purposes, a specific template can be reused many times with different mappings to handle different data sources, a mapping representing some business logic can be ported to another platform by rendering with another template etc.  Splendid!



Knowledge Management feat. Wiktionary

March 12, 2010 12:55 by marx

Wiktionary is about Knowledge Management.

Although the term itself has been around for ages, it would probably be hard to find two persons who would agree on what it stands for precisely. Knowledge management has come a long way, from huge hierarchical file systems full of text files of the 70's, to dedicated document management systems of the 80's, to enterprise portals, intranets and content management systems of the 90's. However, it's always been a balancing act between strengths and weaknesses in particular areas, to get the mix between collaborative, structural and navigational facets right.

Two burning issues building a knowledge management infrastructure as we see it are: How to define and access the knowledge we want to manage? and How to store the knowledge we have created/defined?

Regarding the first question, the keywords are collaborative effort in knowledge creation, and intuitive, effortless navigation during knowledge retrieval. In today's internet one of the most successful technologies of the Web 2.0 era is Wikipedia, or more generally - wiki. This is arguably the easiest to use, most widely recognised and probably the cheapest to build method to give a huge number of very different people located all over the world an efficient access to manage an unimaginably vast amount of complex and disparate information. So we found it to be good and put it to use.

One way to define knowledge management in a simple way is: it's about things (concepts, ideas, facts etc.) and relationships between them. In our today's internet-based world we have probably most (or at least a big share) of the data, facts and figures we ever need freely available for us, anytime, anywhere. So it's not about the existence or access of data, it's about navigation and finding it. The relationships are as important and sometimes even more important than the related items themselves. More than that, relationships tend to carry information with them, which might be even more significant than the information carried by the related items. 

Which brings us to the semantics (meaning) of the relationships. In Wikipedia (and in the Internet in general) the links carry only one universal meaning: we can navigate from here to there. A human being clicking on a link has to guess the meaning and significance of the link, and he/she does this by using a combination of intuition, experience and creativity. However, this is a pretty limited and inefficient way to associate things to each other. Adding semantics to relationships enables us to understand why and how various ideas, concepts, topics and terms are related. Some very obvious examples: 'synonym', 'antonym', 'part of', 'previous version', 'owner', 'creator'. The mindshift towards technologies with more semantically 'rich' relations is visible in the evolution from classifications to ontologies, from XML to RDF etc.

Finally, simply by enumerating things and relationships between them we have created a model, which forces us to think 'properly': we only define concepts and ideas that are meaningful in our domain of interest, and we only define relationships that are actually allowed and possible between those concepts and ideas. A model validates all our proceedings and forces us to 'do right things'. Wiktionary employs this approach as the cornerstone of it's technology; in fact, the metamodel acting as the base of Wiktionary houses a multitude of different models, enabling Wiktionary to support management of knowledge in disparate subject domains simultaneously and even have links between concepts belonging to different domains. So, regarding our second issue, metamodel defines a structured storage mechanism for our knowledge repository.

In data processing world, there has always been an ancient controversy between structured and unstructured data. Structured data is good for the computers, and can be managed and processed very efficiently. However, we, humans tend to think in an unstructured way, and most of us feel very uncomfortable while being forced to squeeze the way we do things into rigid and structured patterns. Wiktionary aims to bridge those two opposites by building on a well-defined underlying structure, at the same time providing a comfortable, unstructured user experience. We have two pretty controversial goals and the approach we have taken - Wiktionary - is arguably the cheapest route to solve both of them.



MMX Wiktionary: A Wiki With An Attitude

November 17, 2009 14:45 by kalle

MMX Wiktionary is the web based collaborative application, top on MMX Metadata Framework, to provide semantic wiki-like user interface for metadata creation and management. Main creative idea, behind the MMX Wiktionary, is structured, metamodel driven, universal metadata repository in combination with wiki user interface. This combination allows users to see and feel complicated metadata structures as conventional pages, without losing required formalization, driven by defined metamodel. Same time, there are no restrictions to use Wiktionary for loosely formalized content creation, like document management, using predefined open schema/metamodel approach, when needed. While it seems easier to start without modeling, we do not see it promising for organizational metadata perspective. Our moderately modeled approach brings guided metadata creation to every end user, in intuitive and simplified form. We do not sacrifice the semantics, which is coded to metamodel, in the journey of simplification and usability creation. Content dependent classification of pages, hierarchy management, named relation and properties extraction and linking within text creation, are some examples of usability and semantics mashup.  

The editor user interface is one of the biggest challenges in our wiki initiative. Trying to avoid wiki's markup mess we use "wysiwyg" editor, for rich content formatting and directed metadata creation. The editor is meant to end users, who are grown up with mark-and-click editing style and do not know or remember, how text creation was "programmed" in WordStar or WordPerfect environment or do not have extensive "writing in Wikipedia" experience. Created text parsed during saving and stored to metadata repository in structured form, that is defined by model. The rich formatting is stored to the text body, using basic html markup, which will be interpreted during reading and writing, by browser and editor. Defined properties and created links will be extracted from text and stored to metadata structures as property values or relations between objects. Addition to saved text, the markup is the connection mechanism between text and stored properties and relation, which is giving the layout and presentation dimension to the captured metadata, same time preserving structure and machine process capability. 

Some keywords and topics in our Wiktionary initiative, which keeps us busy: 

  • wiki style ui and wysiwyg editor
  • usability and semantics, integration of user interface and metadata
  • community driven content tagging for business glossary creation
  • page templates and metadata driven layout
  • history and versioning
  • discussion forum and commentaries
  • import and export