Cube Design for Survey Data
Posted: Mon Oct 05, 2009 7:08 pm
Hi All,
I'm struggling with the appropriate way to design my database/cube. I'll start with some background. I'm trying to develop a cube that will allow users to see results of survey data. This survey data consists of what we call demographic questions and metric questions. the demographic questions describe the respondent (ie. gender, age, geography) and the metric questions are the actual facts we are collecting and aggregating on (ie. On a scale of 1 - 7 how happy are you at work?). My current setup is a fact table with answers to the survey questions, FactResponse. So if it's a metric question, it will have a fk in to the question table. It will also have a responseValue column that you can aggregate on. This table includes a respondentID. So a respondent ID would have multiple rows in the fact table. (one for every question in the survey including the demographic questions) I have a few dim tables, dimSurvey (which of our many surveys this fact is from), dimOrganization (which company the response belongs to), dimDate (when the survey was completed) and where I'm struggling the most dimQuestion.
Details on dimQuestion: This table includes a row for every answer choice for every question. so the question, "On a scale of 1 - 7 how happy are you at work?" there are 7 rows. the columns of this table are basically a PrimaryKey, Question Text, QuestionAnswer. Keep in mind that this table also includes the demographic questions (ie. "What is your gender?" has 2 rows "Male" and "Female")
So what i need to develop is a cube where i can get the average of the question "How happy are you at work" and slice it by multiple demographics.
So how happy are males between the ages of 18-25? The answer would be the average of ResponseValue column of the FactReponse table for that particular question that matches those demographics. (of course the demographics are on seperate rows within that fact table)
So to me, while trying to design this, i find myself apparently needing to slice by the dimQuestion dimension multiple times.(so that i'm slicing by gender and age). I can't grasp in my head the appropriate way to set this up and i'd love to hear some opinions from people with more experience than me.
Some notes to consider. Each time a company takes a survey, they are able to add their own custom questions specific to their company. So they may add a business unit demographic question that has their specific BUs and i'd need to cut by that. (it would be an additional question in the survey) Likewise they could add a custom metric question that would need to be aggregated on.
I'm struggling with the appropriate way to design my database/cube. I'll start with some background. I'm trying to develop a cube that will allow users to see results of survey data. This survey data consists of what we call demographic questions and metric questions. the demographic questions describe the respondent (ie. gender, age, geography) and the metric questions are the actual facts we are collecting and aggregating on (ie. On a scale of 1 - 7 how happy are you at work?). My current setup is a fact table with answers to the survey questions, FactResponse. So if it's a metric question, it will have a fk in to the question table. It will also have a responseValue column that you can aggregate on. This table includes a respondentID. So a respondent ID would have multiple rows in the fact table. (one for every question in the survey including the demographic questions) I have a few dim tables, dimSurvey (which of our many surveys this fact is from), dimOrganization (which company the response belongs to), dimDate (when the survey was completed) and where I'm struggling the most dimQuestion.
Details on dimQuestion: This table includes a row for every answer choice for every question. so the question, "On a scale of 1 - 7 how happy are you at work?" there are 7 rows. the columns of this table are basically a PrimaryKey, Question Text, QuestionAnswer. Keep in mind that this table also includes the demographic questions (ie. "What is your gender?" has 2 rows "Male" and "Female")
So what i need to develop is a cube where i can get the average of the question "How happy are you at work" and slice it by multiple demographics.
So how happy are males between the ages of 18-25? The answer would be the average of ResponseValue column of the FactReponse table for that particular question that matches those demographics. (of course the demographics are on seperate rows within that fact table)
So to me, while trying to design this, i find myself apparently needing to slice by the dimQuestion dimension multiple times.(so that i'm slicing by gender and age). I can't grasp in my head the appropriate way to set this up and i'd love to hear some opinions from people with more experience than me.
Some notes to consider. Each time a company takes a survey, they are able to add their own custom questions specific to their company. So they may add a business unit demographic question that has their specific BUs and i'd need to cut by that. (it would be an additional question in the survey) Likewise they could add a custom metric question that would need to be aggregated on.