Newer
Older
labs / tiddlers / content / labs / lab05 / _Labs_05_Task 2_ Joins.md
Very often, you need information from two or more tables. The relational *join* operation combines related data from multiple tables, returning a single table. Specifically, join combines pairs of rows from the two tables that have the same value in some common set of columns—most often the foreign key in one and the referenced primary key in the other. For example, to see full details of samples and the scientists that gathered them:  
```plaintext
select *
from Sample inner join Scientist using (Scientist_Num);
```  
Run the above query. Why doesn’t the Scientist_Num column appear twice, given that it exists in both tables?  

---
What happens if you swap the order of Sample and Scientist in the FROM clause? Is the difference important? 

---
The most common type of join is the inner join, which retrieves only matching pairs. Unmatched rows don’t appear in the result. If you want unmatched rows to be included, there are several kinds of outer join (syntax: [LEFT|RIGHT|FULL] OUTER JOIN) that can be used instead. These usually introduce nulls into the result.  

If the join columns have different names, you’ll need to specify join conditions with ON instead of USING. For example, if the Scientist_Num foreign key in Sample was instead called Scientist_No:  
```plaintext
select *
from Sample inner join Scientist on (Scientist_Num = Scientist_No);
```
Rewrite the first join above with ON instead of USING. What additional changes do you need to make to get this to work, and why? 

---
Joins involving more than two tables can be done simply by adding more join operations in the FROM clause. For example:  
```plaintext
select *
from Sample inner join Scientist using (Scientist_Num)
            inner join Site using (Site_ID);
```
You should always have an ERD handy when writing queries. The ERD is like a map of the database, giving you an overview of all tables and their columns, and also the relationships among them. Each relationship you traverse will add a join to your query. If you need data from two distant tables, you may have to join through a number of intervening tables as well. To be useful, the ERD should explicitly include all foreign key attributes, etc.  

**Here are some hints for writing joins:**

•	Work incrementally. Plan the path using the ERD, and implement one join at a time, testing at each step. If there are multiple paths, make sure you choose the right one!  
•	Don’t do more joins than necessary. Use the foreign key attributes!  
•	Check the number of rows returned by the query. If you see more rows than the largest table being joined, you have a problem.  
•	Fill in the SELECT clause last (just use select * while testing).