What does "Use Unicode" checkbox means in TM1 Architect Turbo Integrator?

Post Reply
Stanislav2
Posts: 31
Joined: Tue Aug 20, 2013 5:53 am
OLAP Product: TM1
Version: 10.1.1
Excel Version: -

What does "Use Unicode" checkbox means in TM1 Architect Turbo Integrator?

Post by Stanislav2 »

Hi,
using Cognos TM1 10.1.1 fixpack 2 64-bit.

1. In TM1 Architect right click on Processes and select Create new Process.
2. Turbo Integrator dialog opens. Click on ODBC.
3. There is "Use Unicode" checkbox. What does this checkbox means?
a) Source data from ODBC data source are in Unicode?
b) Target data TM1 e.g. dimension, members, attributes in TM1 server are going to be stored as Unicode?
c) Something else?

I always just left this checkbox checked, but our company is talking about multi-language support (also additional alphabets like Russian Cyrillic) in all of our solutions so I am wondering what this checkbox actually does.

Additional question: What does "Unicode" means in this dialog? In theory there are several code-pages in Unicode like UTF-8, UTF-16, UTF-32. There are also "UTF-8 with BOM", "UTF-16 "Big Endian" (also BOM and without BOM), "UTF-16 Little Endian" etc?

Can you point me to some additional source to read the details?
Thanks
User avatar
jim wood
Site Admin
Posts: 3951
Joined: Wed May 14, 2008 1:51 pm
OLAP Product: TM1
Version: PA 2.0.7
Excel Version: Office 365
Location: 37 East 18th Street New York
Contact:

Re: What does "Use Unicode" checkbox means in TM1 Architect Turbo Integrator?

Post by jim wood »

Unicode is a way for storing data. Some databases are setup as unicode, some are not. TM1 became unicode at very 9.4 (if memory serves) For more details go here:
https://en.wikipedia.org/wiki/Unicode
Struggling through the quagmire of life to reach the other side of who knows where.
Shop at Amazon
Jimbo PC Builds on YouTube
OS: Mac OS 11 PA Version: 2.0.7
Stanislav2
Posts: 31
Joined: Tue Aug 20, 2013 5:53 am
OLAP Product: TM1
Version: 10.1.1
Excel Version: -

Re: What does "Use Unicode" checkbox means in TM1 Architect Turbo Integrator?

Post by Stanislav2 »

Thank you for answer. One additional related question: How to specify in TM1 Architect that source text file is encoded in UTF-8 code page?
I created simple text file with cyrillic letter and from sample it is clear that source file is recognized as Windows-1250 code page (default Windows server setting) instead of UTF-8. I see this because UTF-8 single two-byte character is in TM1 recognized as two single-byte characters. See attached picture. Is there a way to specify the file source code page encoding?
Attachments
cyrillic.png
cyrillic.png (64.85 KiB) Viewed 6494 times
Wim Gielis
MVP
Posts: 3105
Joined: Mon Dec 29, 2008 6:26 pm
OLAP Product: TM1, Jedox
Version: PAL 2.0.9.18
Excel Version: Microsoft 365
Location: Brussels, Belgium
Contact:

Re: What does "Use Unicode" checkbox means in TM1 Architect Turbo Integrator?

Post by Wim Gielis »

Hello,

Never used before myself, but did you have a look at the TI function 'SetInputCharacterSet' ?
Best regards,

Wim Gielis

IBM Champion 2024
Excel Most Valuable Professional, 2011-2014
https://www.wimgielis.com ==> 121 TM1 articles and a lot of custom code
Newest blog article: Deleting elements quickly
Stanislav2
Posts: 31
Joined: Tue Aug 20, 2013 5:53 am
OLAP Product: TM1
Version: 10.1.1
Excel Version: -

Re: What does "Use Unicode" checkbox means in TM1 Architect Turbo Integrator?

Post by Stanislav2 »

@Wim Gielis, thanks for help.

According to TM1 Reference Guidle documentation: For formats lacking a valid byte-order-mark, the characters must be converted from some other encoding to UTF-8. The SetInputCharacterSet function lets you specify the character set used in a TurboIntegrator data source.

I can confirm above statement from documentation with two tests.

Test 1:
1. Opened UTF-8 file with Notepad and just save it. Add the beginning of the file Noteapad adds BOM (byte order mark) in case of UTF-8 that means special 3 bytes at very beginning of file.
2. Pressing Preview button in Turbo Integrator window and UTF-8 characters appear correctly displayed.
3. In Turbo Integrator I have created a process to create a dimension with members and run a process.
4. Opened dimension and cyrillic characters are really loaded and so working fine. Problem solved.

Test 2:
1. Back to my original file UTF-8 without a BOM at the beginning of the file. If loaded in this exactly the same way like in previous "Test 1" steps character gets incorrectly loaded TM1, because TM1 is using default system locale.
2. To bypass this problem like Wim suggested I added following code at the top of Prolog tab: SetInputCharacterSet('TM1CS_UTF8');
and characters are correctly loaded as dimension members.
Note: In "Data Source" "Turbo Integrator" window there is still incorrectly displayed characters most probably because command "SetInputCharacterSet" only takes effect when process is run. But this is not a problem, because UTF-8 characters are correctly imported into Dimension members.
Stanislav2
Posts: 31
Joined: Tue Aug 20, 2013 5:53 am
OLAP Product: TM1
Version: 10.1.1
Excel Version: -

Re: What does "Use Unicode" checkbox means in TM1 Architect Turbo Integrator?

Post by Stanislav2 »

I did additional test with DB2 UTF-8 defined database storing data in varchar data type (data stored in UTF-8) and using vargraphic (data stored in UCS-2 code page).

I am still wondering what is the Unicode code page used by TM1 internally? Is it UTF-8 or something else like UTF-16 or something else when Use Unicode is checked in Turbo Integrator? Is there any way I can check which Unicode code page is used in TM1 internally? I would like to know this because of reporting that will follow from TM1 data.

Bellow test I have created with DB2 and I would like to share with you.

Test 3:
Attachments
cyrillic_db2.png
cyrillic_db2.png (74.03 KiB) Viewed 6483 times
Post Reply