Page 1 of 1

What does "Use Unicode" checkbox means in TM1 Architect Turbo Integrator?

Posted: Fri Oct 16, 2015 1:13 pm
by Stanislav2
Hi,
using Cognos TM1 10.1.1 fixpack 2 64-bit.

1. In TM1 Architect right click on Processes and select Create new Process.
2. Turbo Integrator dialog opens. Click on ODBC.
3. There is "Use Unicode" checkbox. What does this checkbox means?
a) Source data from ODBC data source are in Unicode?
b) Target data TM1 e.g. dimension, members, attributes in TM1 server are going to be stored as Unicode?
c) Something else?

I always just left this checkbox checked, but our company is talking about multi-language support (also additional alphabets like Russian Cyrillic) in all of our solutions so I am wondering what this checkbox actually does.

Additional question: What does "Unicode" means in this dialog? In theory there are several code-pages in Unicode like UTF-8, UTF-16, UTF-32. There are also "UTF-8 with BOM", "UTF-16 "Big Endian" (also BOM and without BOM), "UTF-16 Little Endian" etc?

Can you point me to some additional source to read the details?
Thanks

Re: What does "Use Unicode" checkbox means in TM1 Architect Turbo Integrator?

Posted: Fri Oct 16, 2015 1:28 pm
by jim wood
Unicode is a way for storing data. Some databases are setup as unicode, some are not. TM1 became unicode at very 9.4 (if memory serves) For more details go here:
https://en.wikipedia.org/wiki/Unicode

Re: What does "Use Unicode" checkbox means in TM1 Architect Turbo Integrator?

Posted: Tue Oct 20, 2015 7:36 am
by Stanislav2
Thank you for answer. One additional related question: How to specify in TM1 Architect that source text file is encoded in UTF-8 code page?
I created simple text file with cyrillic letter and from sample it is clear that source file is recognized as Windows-1250 code page (default Windows server setting) instead of UTF-8. I see this because UTF-8 single two-byte character is in TM1 recognized as two single-byte characters. See attached picture. Is there a way to specify the file source code page encoding?

Re: What does "Use Unicode" checkbox means in TM1 Architect Turbo Integrator?

Posted: Tue Oct 20, 2015 7:58 am
by Wim Gielis
Hello,

Never used before myself, but did you have a look at the TI function 'SetInputCharacterSet' ?

Re: What does "Use Unicode" checkbox means in TM1 Architect Turbo Integrator?

Posted: Tue Oct 20, 2015 8:53 am
by Stanislav2
@Wim Gielis, thanks for help.

According to TM1 Reference Guidle documentation: For formats lacking a valid byte-order-mark, the characters must be converted from some other encoding to UTF-8. The SetInputCharacterSet function lets you specify the character set used in a TurboIntegrator data source.

I can confirm above statement from documentation with two tests.

Test 1:
1. Opened UTF-8 file with Notepad and just save it. Add the beginning of the file Noteapad adds BOM (byte order mark) in case of UTF-8 that means special 3 bytes at very beginning of file.
2. Pressing Preview button in Turbo Integrator window and UTF-8 characters appear correctly displayed.
3. In Turbo Integrator I have created a process to create a dimension with members and run a process.
4. Opened dimension and cyrillic characters are really loaded and so working fine. Problem solved.

Test 2:
1. Back to my original file UTF-8 without a BOM at the beginning of the file. If loaded in this exactly the same way like in previous "Test 1" steps character gets incorrectly loaded TM1, because TM1 is using default system locale.
2. To bypass this problem like Wim suggested I added following code at the top of Prolog tab: SetInputCharacterSet('TM1CS_UTF8');
and characters are correctly loaded as dimension members.
Note: In "Data Source" "Turbo Integrator" window there is still incorrectly displayed characters most probably because command "SetInputCharacterSet" only takes effect when process is run. But this is not a problem, because UTF-8 characters are correctly imported into Dimension members.

Re: What does "Use Unicode" checkbox means in TM1 Architect Turbo Integrator?

Posted: Tue Oct 20, 2015 9:19 am
by Stanislav2
I did additional test with DB2 UTF-8 defined database storing data in varchar data type (data stored in UTF-8) and using vargraphic (data stored in UCS-2 code page).

I am still wondering what is the Unicode code page used by TM1 internally? Is it UTF-8 or something else like UTF-16 or something else when Use Unicode is checked in Turbo Integrator? Is there any way I can check which Unicode code page is used in TM1 internally? I would like to know this because of reporting that will follow from TM1 data.

Bellow test I have created with DB2 and I would like to share with you.

Test 3: