Review
Homework Answer
For orders in July of 2008, list the territory IDs, average total sale and number of sales per territory ID, sorted by best to worst sales
SELECT TerritoryID AS ID, AVG(TotalDue) AS AOV, COUNT(*) AS NumOrders FROM sales.SalesOrderHeader WHERE --OrderDate > '7/1/2008' AND OrderDate < '2008-8-1' OrderDate BETWEEN '7/1/2008' AND '8/1/2008' GROUP BY TerritoryID ORDER BY AOV DESC
COUNTing
COUNT(*) is very efficient since it has no restrictions and counts every qualifying row. COUNT(column) does not count NULL values, but otherwise counts every row and pretty fast. COUNT(DISTINCT column) requires the database engine to track the values it has already seen. It also does not count NULL values. This is the least efficient choice, so use it only when needed.
Take a look at the data from this query:
SELECT SalesOrderID, CreditCardID FROM Sales.SalesOrderHeader WHERE OrderDate > '7/1/2008'
Now, observe the differences between the variations of COUNT:
SELECT COUNT(*), COUNT(distinct CreditCardID), COUNT(CreditCardID) FROM Sales.SalesOrderHeader WHERE OrderDate > '7/1/2008'
HAVING
The HAVING clause is the equivalent of WHERE for aggregate functions. Pesky aggregate functions.
Example: How many orders by shipping method since 8/7/2013:
SELECT TerritoryID, COUNT(*) as NumOrders FROM Sales.SalesOrderHeader WHERE OrderDate > '7/1/2008' GROUP BY TerritoryID
But let’s only look at methods that have a statistically relevant number of orders:
SELECT TerritoryID, COUNT(*) as NumOrders FROM Sales.SalesOrderHeader WHERE OrderDate > '7/1/2008' GROUP BY TerritoryID HAVING COUNT(*) > 10
Normalization
Without getting technical, one of the biggest goals in transactional database design is to reduce data duplication. This means a data value, or collection of values like an address, do not appear more than once. The amount of data duplication in a database schema is known as Normalization. This is why tables get split out to more tables and we have dependencies from one table to another.
When we design a table like Sales.SalesOrderHeader, we might put the shipping address in the table. However, once a customer orders a second time from us we find the same shipping address appearing more than once in our data. So, we pull the shipping address columns out to a separate table and leave behind a reference ID.
Foreign Keys
This reference ID is called a Foreign Key. It has the same value as the Primary Key in the table it references. A table can have a compound Primary Key which is multiple columns that together form a unique ID for the table. Referencing a table with a compound Primary Key requires all of columns involved in the key, thus a compound Foreign Key.
Take a look at the columns in the Orders table in the Object Explorer and you will see several Foreign Keys (FK).
[INNER] JOIN … ON … [AND]
We join tables together because there is desired related data for our query. For example, to get the name of the customer for an order, we will have to join from the orders table to the customers table.
- There multiple types of Joins with the most common, and default, being the Inner Join.
- Draw the columns of another table into the query by specifying the relationship between the query and the incoming table:
JOIN JoiningTable ON JoiningTable.PrimaryKey = ReferencingTable.ForeignKey - For compound keys, specify all of the relationships separated by AND:
JOIN JoiningTable ON JoiningTable.PrimaryKey1 = ReferencingTable.ForeignKey1 AND JoiningTable.PrimaryKey2 = ReferencingTable.ForeignKey2 - Any rows that cannot be matched across the join will be dropped from the result set. This usually happens when the FK column contains NULL.
- Make a habit now of using table aliases to save typing. Table aliases are required for future complex join structures.
Here is a list of the number of orders by territory:
SELECT t.Name, COUNT(*) FROM Sales.SalesOrderHeader o JOIN Sales.SalesTerritory t ON t.TerritoryID = o.TerritoryID WHERE o.OrderDate > '7/1/2008' GROUP BY t.Name
Count of orders by shipping method and territory which requires joining from SalesOrderHeader to SalesTerritory and SalesOrderHeader to ShippingMethod:
SELECT sm.Name, t.Name, COUNT(*) FROM Sales.SalesOrderHeader o JOIN Sales.SalesTerritory t ON t.TerritoryID = o.TerritoryID JOIN Purchasing.ShipMethod sm ON sm.ShipMethodID = o.ShipMethodID WHERE o.OrderDate > '7/1/2008' GROUP BY t.Name, sm.Name
Count of orders by shipping state which requires joining from SalesOrderHeader to Address and then from Address to StateProvice:
SELECT sp.Name, COUNT(*) FROM Sales.SalesOrderHeader o JOIN Person.Address a ON a.AddressID = o.ShipToAddressID JOIN Person.StateProvince sp ON sp.StateProvinceID = a.StateProvinceID WHERE o.OrderDate > '7/1/2008' GROUP BY sp.Name
Homework Exercise
List AOV per credit card type and country region code for July, 2008. Drop rows with statistically irrelevant results, i.e. less than 10. Sort by card type, AOV descending, then country.