Search This Blog

Wednesday, January 31, 2007

Data Modeling Interview Questions

What is the difference between star flake and snow flake schema?

Star Schema:

Well in star schema you just enter your desired facts and all the primary keys of your dimensional tables in Fact table. And fact tables primary is the union of its all dimension table key. In star schema dimensional tables are usually not in BCNF form.

SnowFlake:

Its almost like starschema but in this our dimension tables are in 3rd NF, so more dimensions tables. And these dimension tables are linked by primary, foreign key relation.

What is data sparsity and how it effect on aggregation?

Data sparsity is term used for how much data we have for a particular dimension/entity of the model.

It affects aggregation depending on how deep the combination of members of the sparse dimension make up. If the combination is a lot and those combinations do not have any factual data then creating space to store those aggregations will be a waste as a result, the database will become huge.

What is the difference between hashed file stage and sequential file stage in relates to DataStage Server?

In datastage server jobs,can we use sequential filestage for a lookup instead of hashed filestage.If yes ,then whats the advantage of a Hashed File stage over sequential filestage

search is faster in hash files as you can directly get the address of record directly by hash algorithm as records are stored like that but in case of sequential file u must compare all the records.

When should you consider denormalization?

Denormalization is used when there is a lot of tables involved in retreiving data.Denormalization is done in dimentional modelling used to construct a data ware house.This is not usually done for data bases of transactional systems.

What is ERD?

ERD - Entity Relationship Diagram

ERD is a modeling technique that represents a logical design of a particular business process. Each entity has attributes and different entities are connected through primary key/foreign key relationships.

Data models are tools used in analysis to describe the data requirements and assumptions in the system from a top-down perspective. They also set the stage for the design of databases later on in the SDLC.

There are three basic elements in ER models:
Entities are the "things" about which we seek information.
Attributes are the data we collect about the entities.
Relationships provide the structure needed to draw information from multiple entities.

Every statement written above is correct regarding ER Diagrams except that they fall under conceptual design phase and not under logical.

A relational database design goes through 3 phases:

1. Conceptual ( ER Diagrams)

2. Logical ( Relational Schema)

3. Physical ( Physical Database objects)

Why are recursive relationships are bad? How do you resolve them?

Recursive relationships are an interesting and more complex concept than the relationships you have seen in the previous chapters, such as a one-to-one, one-to-many, and many-to-many. A recursive relationship occurs when there is a relationship between an entity and itself. For example, a one-to-many recursive relationship occurs when an employee is the manager of other employeess. The employee entity is related to itself, and there is a one-to-many relationship between one employee (the manager) and many other employees (the people who report to the manager). Because of the more complex nature of these relationships, we will need slightly more complex methods of mapping them to a schema and displaying them in a stylesheet.

What is an artificial (derived) primary key? When should it be used?

Using a name as the primary key violates the principle of stability. The social security number might be a valid choice, but a foreign employee might not have a social security number. This is a case where a derived, rather than a natural, primary key is appropriate. A derived key is an artificial key that you create. A natural key is one that is already part of the database.

Whether artificial primary key is equvalent to the row_id of the table or it is the the combination of the previous primary key field and the new field which creates the importance of it.

No comments: