Page 1 of 1

Bifurcation of Advanced Tabs in TI

Posted: Tue Aug 23, 2016 10:38 pm
by ttesla
Hi,

This post is to improve understanding of why separate Tabs are available for writing TI Code. IBM document clearly states each tab action/activity but never understood why separate tabs are provided.I am under impression that separation of tabs makes management of code lot easy rather than writing code in single stretch like other programming languages (procedural programming languages like C,C++ etc.) but many of my colleagues does not believe in my argument. One of my close friend told me that TM1 Engine is built on OLTP principles like Cross Join. He said, using Cross Join concept TM1 shows multiple products, versions etc. at one time. When heard his view, got confused on why per each record data tab and meta tab runs simultaneously and why cannot we write meta data structures in data tab (something like this can be achieved by using DimensionInsertDirect ). I want to get more insights from experts on this topic.

Thanks
Tesla
Note: Tried searching in forum about this topic or related topic but could not find. If any one finds relevant topic, please let me know.

Re: Bifurcation of Advanced Tabs in TI

Posted: Wed Aug 24, 2016 7:48 am
by lotsaram
ttesla wrote:Note: Tried searching in forum about this topic or related topic but could not find. If any one finds relevant topic, please let me know.
I suspect this is because you might be using the wrong terms. Honestly reading your question I can't really make out what you are asking. I can make a GUESS but I don't know that its the right one.

General Purpose of "Advanced" Tabs in TurboIntegrator (this much you should at least get from the manuals):
Prolog: where all actions are scripted that are done before the data source is opened
Metadata: where changes to dimensions are scripted, relevant only for processes where a data source is defined. Code here is repeated for each data source record.
Data: where updates to model data are scripted, relevant only for processes where a data source is defined. Code here is repeated for each data source record.
Epilog: if required where "tidy up" actions are scripted. Executed after the data source is closed

I am 100% certain that the above has been discussed multiple times on this forum. It is also stated in the TurboIntegrator guide.

In a lot of ETL tools which query OLAP data the "data source" is defined in the code and EXPLICITLY ITERATED within the code. In TM1's TurboIntegrator script language there is no concept of iterating a view object; this is done IMPLICITLY not EXPLICITLY. This is what the Metadata and Data tabs do. Although yes you could explicitly iterate a portion of a cube by nesting while loops over dimension subsets of dimensions belonging to the cube generally this isn't done as there is no way to avoid null records and you rapidly encounter a problem of exponential explosion of data points. Much better to set a the view data source property to skip null records (if the data source type is VIEW) and use the "implicit loops" of the Metadata and Data tabs.

You have some false assumptions:
Metadata and Data code is NOT executed simultaneously. These are executed SEQUENTIALLY. That is the data source is read 2x in its entirety. Note this only happens if there is script written on the tab. If the tab contains no code then the data source reading is skipped. This is important to realize and take into account for large data sets; if there are no Metadata updated to do then don't have any code (even declaration of variables) in the tab, otherwise the data source is iterated unnecessarily.

Why is there a Metadata tab at all? This is in case there are new dimension elements, the elements must of course be added to the dimensions and the changes committed/compiled before data can be loaded against said new elements. Where there are Dimension changes TM1 creates a temp copy of the edited dimension and all changes are made to the temp object. Metadata changes are committed 2 possible times 1/ at the completion of the Prolog and 2/ at the completion of the Metadata (after closing the data source the 1st time). Any changes to dimensions coded on the Data or Epilog are ineffective since there is no commit where the temp copy dimension is saved and replaces the original. Note the "DimensionDirect" functions are relatively new, these functions allow for dimensions to be edited directly (without creating the shadow copy and the required commit). Therefore these can be used on any tab. But there are some performance watchouts if using these. When to use the standard dimension update functions and when to use the direct functions is a separate topic

Re: Bifurcation of Advanced Tabs in TI

Posted: Wed Aug 24, 2016 9:24 am
by qml
ttesla wrote:I am under impression that separation of tabs makes management of code lot easy rather than writing code in single stretch like other programming languages (procedural programming languages like C,C++ etc.) but many of my colleagues does not believe in my argument.
I have to say I don't agree with your argument either. First and foremost the split into tabs is functional and there are good reasons for it, which lotsa eloquently explained. Any impact this has on code management is an afterthought.
ttesla wrote:One of my close friend told me that TM1 Engine is built on OLTP principles like Cross Join. He said, using Cross Join concept TM1 shows multiple products, versions etc. at one time.
What your close friend told you is quite a vague statement. If interpreted literally, then it's not an accurate representation of TM1's engine. TM1 has certain traits of an OLTP system (concurrency, recoverability, speed) but lacks some others (e.g. high availability and resilience). It makes little sense to use the term 'cross join' in the context of TM1, because it is not a relational database and does not include one. It is a true multidimensional database (MOLAP) and does not do any joins even inernally, unlike ROLAP. Data is stored in digital trees, whose logical representation are cubes that have between 2 and 255 dimensions which are used to uniquely define the address of each cell. There is no underlying star schema and there are no relational joins.
ttesla wrote:When heard his view, got confused on why per each record data tab and meta tab runs simultaneously and why cannot we write meta data structures in data tab (something like this can be achieved by using DimensionInsertDirect ).
As lotsa already said, the tabs do not get executed simultaneously. There is a good historical/logical reasons for the Metadata and Data tabs being split. One might want to (or even need to) update dimensions from the same source that data comes from. Any dimension changes need to be applied and dimensions recompiled before any new elements can be used to store data against. The Direct metadata functions that have been added recently have some traits of being a hack. When these functions are used the dimension is not recompiled as a whole - this has upsides as well as downsides and I would still recommend doing it the traditional way except for a few well defined use cases.
lotsaram wrote:Metadata changes are committed 2 possible times 1/ at the completion of the Prolog and 2/ at the completion of the Metadata (after closing the data source the 1st time).
Just to clarify this statement as it can be misinterpreted. There is only one moment in the execution of a TI process where metadata changes are committed. In a process that has a data source this happens between Metadata and Data tabs - all changes made in Prolog and Metadata are collected and committed together. In a process without a data source this happens between Prolog and Epilog (there is no Data/Metadata).