Asked  7 Months ago    Answers:  5   Viewed   42 times

Can anyone explain how to implement one-to-one, one-to-many and many-to-many relationships while designing tables with some examples?

 Answers

49

One-to-one: Use a foreign key to the referenced table:

student: student_id, first_name, last_name, address_id
address: address_id, address, city, zipcode, student_id # you can have a
                                                        # "link back" if you need

You must also put a unique constraint on the foreign key column (addess.student_id) to prevent multiple rows in the child table (address) from relating to the same row in the referenced table (student).

One-to-many: Use a foreign key on the many side of the relationship linking back to the "one" side:

teachers: teacher_id, first_name, last_name # the "one" side
classes:  class_id, class_name, teacher_id  # the "many" side

Many-to-many: Use a junction table (example):

student: student_id, first_name, last_name
classes: class_id, name, teacher_id
student_classes: class_id, student_id     # the junction table

Example queries:

 -- Getting all students for a class:

    SELECT s.student_id, last_name
      FROM student_classes sc 
INNER JOIN students s ON s.student_id = sc.student_id
     WHERE sc.class_id = X

 -- Getting all classes for a student: 

    SELECT c.class_id, name
      FROM student_classes sc 
INNER JOIN classes c ON c.class_id = sc.class_id
     WHERE sc.student_id = Y

Tuesday, June 1, 2021
 
Lance
answered 7 Months ago
69

In general I prefer obtion "B" (i.e. one table for base class and one table for each "concrete" subclass).

Of course this has a couple of drawbacks: first of all you have to join at least 2 tables whenever you have to read a full instance of a subclass. Also, the "base" table will be constantly accessed by anyone who has to operate on any kind of note.

But this is usually acceptable unless you have extreme cases (billions of rows, very quick response times required and so on).

There is a third possible option: map each subclass to a distinct table. This helps partitioning your objects but costs more in development effort, in general.

See this for a complete discussion.

(Regarding your "C" solution, using VARIANT: I can't comment on the merits/demerits, because it looks like a proprietary solution - what is it ? Transact-SQL? and I am not familiar with it).

Wednesday, June 30, 2021
 
pamelus
answered 6 Months ago
50

I've seen variations of this question so many times I thought I would write a Q&A.

Key DynamoDB Basics

Before reading this you should understand:

  • Every DynamoDB table has a unique primary key
  • A primary key must be composed of a partition key, and can have optionally have a sort key. A primary key with both a partition key and a sort key is a composite key.
  • A GetItem request returns one and only one item using its unique primary key.
  • A Query does a fast lookup and must specify one and only one partition key. It can return multiple items.
  • A Scan evaluates every item in a table and may return a subset based on filter parameters. Scans are the correct choice in some circumstances, but can be slow and costly if used incorrectly.
  • A Global Secondary Index (GSI) has a different partition key to the base table. Think of it a bit like having two tables (the base table and GSI) that are kept in sync. Depending on use, a GSI may double the cost of your base table.
  • A Local Secondary Index (LSI) has the same partition key as the base table but a different sort key. Think of it as a different way to sort your base table data, but only within the partition keys. An LSI does not cost you anything.

One-to-one


We can model passports and people to demonstrate this relationship. One passport can only have one owner, and one person can only have one passport.

The approach is very simple. We have two tables, and one of those tables should have a foreign key.

Passport table:

Partition Key: PassportId

╔════════════╦═══════╦════════════╗
║ PassportId ║ Pages ║   Issued   ║
╠════════════╬═══════╬════════════╣
║ P1         ║    15 ║ 11/03/2009 ║
║ P2         ║    18 ║ 09/02/2018 ║
╚════════════╩═══════╩════════════╝

Passport holder table:

Partition Key: PersonId

╔══════════╦════════════╦══════╗
║ PersonId ║ PassportId ║ Name ║
╠══════════╬════════════╬══════╣
║ 123      ║ P1         ║ Jane ║
║ 234      ║ P2         ║ Paul ║
╚══════════╩════════════╩══════╝

Notice that PersonId did not appear in the passport table. If we did that, we would have two places with the same information (which passports belong to which person). This would lead to extra data updates and potentially some data quality issues if the tables did not agree on who owns which passport.

However, we are missing a use case. We can easily look a person up by their PersonId, and find which passport they have. But what if we have a PassportId and we need to find who owns it? In the current model we would need to perform a Scan on the Passport holder table. If this is a regular use case, we wouldn't want to use a Scan. To support a GetItem we can simply add a GSI to the Passport holder table:

Passport holder table GSI:

Partition Key: PassportId

╔════════════╦══════════╦══════╗
║ PassportId ║ PersonId ║ Name ║
╠════════════╬══════════╬══════╣
║ P1         ║ 123      ║ Jane ║
║ P2         ║ 234      ║ Paul ║
╚════════════╩══════════╩══════╝

Now we can look up relationships using PassportId or PersonId very quickly and cheaply.

There are other options for modelling this. For example you could have a 'plain' Passport table and Person table with no foreign keys, then have a third auxiliary table that simple maps PassortIds and PersonIds together. I don't think that's cleanest design in this case, but if you prefer it, there is nothing wrong with that approach. Note that their is an example of an auxiliary relationship table in the many-to-many relationship section.


One-to-many


We can model pets and owners to demonstrate this relationship. Pets can only have one owner, but owners can have many pets.

The model looks very similar to the one-to-one model, so I will just focus on this differences.

Pet table:

Partition Key: PetId

╔═══════╦═════════╦════════╗
║ PetId ║ OwnerId ║ Type   ║
╠═══════╬═════════╬════════╣
║ P1    ║ O1      ║ Dog    ║
║ P2    ║ O1      ║ Cat    ║
║ P3    ║ O2      ║ Rabbit ║
╚═══════╩═════════╩════════╝

Owner table:

Partition Key: OwnerId

╔═════════╦════════╗
║ OwnerId ║ Name   ║
╠═════════╬════════╣
║ O1      ║ Angela ║
║ O2      ║ David  ║
╚═════════╩════════╝

We put the foreign key in the many table. If we did it the other way around, and put PetIds in the Owner table, one Owner Item would have to have a set of PetIds, and that would get complicated to manage.

If we want to find out the Owner for a Pet, its very easy. We can do a GetItem to return the Pet Item, and it tells us who the owner is. But the other way around is harder - if we have an OwnerId, which Pets do they own? To save us have to do a Scan on the Pet table, we instead add a GSI to the Pet table.

Pet table GSI

Partition Key: OwnerId

╔═════════╦═══════╦════════╗
║ OwnerId ║ PetId ║ Type   ║
╠═════════╬═══════╬════════╣
║ O1      ║ P1    ║ Dog    ║
║ O1      ║ P2    ║ Cat    ║
║ O2      ║ P3    ║ Rabbit ║
╚═════════╩═══════╩════════╝

If we have an OwnerId and we want to find their Pets, we can perform a Query on the Pet table GSI. For example a Query on Owner O1 will return the items with PetId P1 and P2.

You may notice something interesting here. A primary key must be unique for a table. This is true only for the base table. A GSI primary key, in this case just the GSI partition key, does not have to be unique.

In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique

On a side note, a GSI does not need to project all of the same attributes as the base table. If you are using the GSI for lookups only, you may wish to only project the GSI key attributes.


Many-to-many


There are three main ways to model a many-to-many relationship in DynamoDB. Each have strengths and weaknesses.

We can use the example of Doctors and Patients to model this relationship. A Doctor can have many patients and a patient can have many Doctors.


Many-to-many - Option 1 - Auxiliary table


Generally, this is my preferred approach, which is why it goes first. The idea is to create 'plain' base tables with no relationship references. The relationship references then go in auxiliary tables (one auxiliary table per relationship type - in this case just Doctors-Patients).

Doctor table:

Partition Key: DoctorId

╔══════════╦═══════╗
║ DoctorId ║ Name  ║
╠══════════╬═══════╣
║ D1       ║ Anita ║
║ D2       ║ Mary  ║
║ D3       ║ Paul  ║
╚══════════╩═══════╝

Patient table

Partition Key: PatientId

╔═══════════╦═════════╦════════════╗
║ PatientId ║ Name    ║ Illness    ║
╠═══════════╬═════════╬════════════╣
║ P1        ║ Barry   ║ Headache   ║
║ P2        ║ Cathryn ║ Itchy eyes ║
║ P3        ║ Zoe     ║ Munchausen ║
╚═══════════╩═════════╩════════════╝

DoctorPatient table (auxiliary table)

Partition Key: DoctorId

Sort Key: PatientId

╔══════════╦═══════════╦══════════════╗
║ DoctorId ║ PatientId ║ Last Meeting ║
╠══════════╬═══════════╬══════════════╣
║ D1       ║ P1        ║ 01/01/2018   ║
║ D1       ║ P2        ║ 02/01/2018   ║
║ D2       ║ P2        ║ 03/01/2018   ║
║ D2       ║ P3        ║ 04/01/2018   ║
║ D3       ║ P3        ║ 05/01/2018   ║
╚══════════╩═══════════╩══════════════╝

DoctorPatient table GSI

Partition Key: PatientId

Sort Key: DoctorId

╔═══════════╦══════════╦══════════════╗
║ PatientId ║ DoctorId ║ Last Meeting ║
╠═══════════╬══════════╬══════════════╣
║ P1        ║ D1       ║ 01/01/2018   ║
║ P2        ║ D1       ║ 02/01/2018   ║
║ P2        ║ D2       ║ 03/01/2018   ║
║ P3        ║ D2       ║ 04/01/2018   ║
║ P3        ║ D3       ║ 05/01/2018   ║
╚═══════════╩══════════╩══════════════╝

There are three tables, the DoctorPatient auxiliary table is the interesting one.

The DoctorPatient base table primary key must be unique, so we create a composite key of the DoctorId (partition key) and PatientId (sort key).

We can perform a Query on the DoctorPatient base table using DoctorId to get all patients that a Doctor has.

We can perform a Query on the DoctorPatient GSI using PatientId to get all of the Doctors associated with a Patient.

The strengths of this approach are a clean separation of tables, and the ability to map simple business objects directly to the database. It requires no use of more advanced features such as sets.

It is necessary to co-ordinate some updates, for example if you delete a Patient, you also need to be careful to delete the relationships in the DoctorPatient table. However the chance of introducing data quality issues is low compared to some other approaches.

EDIT: DynamoDB now supports Transactions, allowing you to co-ordinate multiple updates into a single atomic transaction across multiple tables.

A potential weakness of this approach is that it requires 3 tables. If you are provisioning tables with throughput, the more tables there are, the thinner you have to spread your capacity. However with the new on-demand feature, this is not a concern.


Many-to-many - Option 2 - Foreign key sets


This approach uses just two tables.

Doctor table:

Partition Key: DoctorId

╔══════════╦════════════╦═══════╗
║ DoctorId ║ PatientIds ║ Name  ║
╠══════════╬════════════╬═══════╣
║ D1       ║ P1,P2      ║ Anita ║
║ D2       ║ P2,P3      ║ Mary  ║
║ D3       ║ P3         ║ Paul  ║
╚══════════╩════════════╩═══════╝

Patient table:

Partition Key: PatientId

╔═══════════╦══════════╦═════════╗
║ PatientId ║ DoctorIds║  Name   ║
╠═══════════╬══════════╬═════════╣
║ P1        ║ D1       ║ Barry   ║
║ P2        ║ D1,D2    ║ Cathryn ║
║ P3        ║ D2,D3    ║ Zoe     ║
╚═══════════╩══════════╩═════════╝

This approach involves storing relationships as a set in each table.

To find the Patients for a Doctor, we can use GetItem on the Doctor table to retrieve the Doctor item. Then the PatientIds are stored as a set in a Doctor attribute.

To find the Doctors for a Patient, we can use GetItem on the Patient table to retrieve the Patient item. Then the DoctorIds are stored as a set in a Patient attribute.

The strength of this approach is that there is a direct mapping between business objects and database tables. There are only two tables so if you are using provision throughput capacity, it doesn't need to be spread too thinly.

The major downside to this approach is the potential for data quality issues. If you link a Patient to a Doctor, you have you co-ordinate two updates, one to each table. What happens if one update fails? You data can get out of sync.

Another downside is the use of Sets in both tables. The DynamoDB SDKs are designed to handle Sets, but certain operations can be complicated when Sets are involved.


Many-to-many - Option 3 - Graph Schema


AWS have previously referred to this as the Adjacency List pattern. It is more commonly referred to as a Graph database or a Triple Store.

I have previously answered this question on the AWS Adjancey List Pattern which seems to have helped some people understand it.

And there is a recent presentation by AWS that talks a lot about this pattern here

The approach involves putting all of the data in just one table.

I've just drawn some example rows rather than the whole table:

Partition Key: Key1

Sort Key: Key2

╔═════════╦═════════╦═══════╦═════════════╦══════════════╗
║ Key1    ║ Key2    ║ Name  ║   illness   ║ Last Meeting ║
╠═════════╬═════════╬═══════╬═════════════╬══════════════╣
║ P1      ║ P1      ║ Barry ║ Headache    ║              ║
║ D1      ║ D1      ║ Anita ║             ║              ║
║ D1      ║ P1      ║       ║             ║ 01/01/2018   ║
╚═════════╩═════════╩═══════╩═════════════╩══════════════╝

And then a GSI is required that inverts the keys:

Partition Key: Key2

Sort Key: Key1

╔═════════╦═════════╦═══════╦═════════════╦══════════════╗
║ Key2    ║ Key1    ║ Name  ║   illness   ║ Last Meeting ║
╠═════════╬═════════╬═══════╬═════════════╬══════════════╣
║ P1      ║ P1      ║ Barry ║ Headache    ║              ║
║ D1      ║ D1      ║ Anita ║             ║              ║
║ P1      ║ D1      ║       ║             ║ 01/01/2018   ║
╚═════════╩═════════╩═══════╩═════════════╩══════════════╝

This model has some strengths in some specific circumstances - it can perform well in highly connected data. If you format your data well, you can achieve extremely fast and scalable models. It is flexible in that you can store any entity or relationship in the table without updating your schema/tables. If you are provisioning throughput capacity it can be efficient as all of the throughput is available to any operation across the application.

This model suffers from some huge downsides if used incorrectly or without serious consideration.

You lose any direct mapping between your business objects and your tables. This almost always results in unreadable spaghetti code. Performing even simple queries can feel very complex. Managing data quality becomes difficult as there is no obvious mapping between the code and the database. Most projects I've seen that use this approach end up writing various utilities, some of which become products in their own right, just to manage the database.

Another minor problem is that every attribute for every item in your model has to exist on one table. This usually results in a table that has hundreds of columns. In itself this is not a problem, but trying to work on a table with that many columns usually throws up simple problems like difficulty in viewing the data.

In short I think AWS have probably released what should have been a useful article in a set of articles, but by failing to introduce other (simpler) concepts for managing many-to-many relationships, they have confused lots of people. So to be clear, the adjacency list pattern can be useful, but its not the only option for modelling many-to-many relationships in DynamoDB. By all means use it if it works for your circumstances such as seriously Big Data, but if not, try one of the simpler models.

Wednesday, July 28, 2021
 
Sethunath
answered 5 Months ago
12

Avoid stringly-typed data by replacing VALUE with NUMBER_VALUE, DATE_VALUE, STRING_VALUE. Those three types are good enough most of the time. You can add XMLTYPE and other fancy columns later if they're needed. And for Oracle, use VARCHAR2 instead of CHAR to conserve space.

Always try to store values as the correct type. Native data types are faster, smaller, easier to use, and safer.

Oracle has a generic data type system (ANYTYPE, ANYDATA, and ANYDATASET), but those types are difficult to use and should be avoided in most cases.

Architects often think using a single field for all data makes things easier. It makes it easier to generate pretty pictures of the data model but it makes everything else more difficult. Consider these issues:

  1. You cannot do anything interesting with data without knowing the type. Even to display data it's useful to know the type to justify the text. In 99.9% of all use cases it will be obvious to the user which of the 3 columns is relevant.
  2. Developing type-safe queries against stringly-typed data is painful. For example, let's say you want to find "Date of Birth" for people born in this millennium:

    select *
    from ReportFieldValue
    join ReportField
        on ReportFieldValue.ReportFieldid = ReportField.id
    where ReportField.name = 'Date of Birth'
        and to_date(value, 'YYYY-MM-DD') > date '2000-01-01'
    

    Can you spot the bug? The above query is dangerous, even if you stored the date in the correct format, and very few developers know how to properly fix it. Oracle has optimizations that make it difficult to force a specific order of operations. You'll need a query like this to be safe:

    select *
    from
    (
        select ReportFieldValue.*, ReportField.*
            --ROWNUM ensures type safe by preventing view merging and predicate pushing.
            ,rownum
        from ReportFieldValue
        join ReportField
            on ReportFieldValue.ReportFieldid = ReportField.id
        where ReportField.name = 'Date of Birth'
    )
    where to_date(value, 'YYYY-MM-DD') > date '2000-01-01';
    

    You don't want to have to tell every developer to write their queries that way.

Wednesday, July 28, 2021
 
SpiderLinked
answered 5 Months ago
82

EF doesn't know where User.Events has to be mapped to. It could be Event.CreatedBy or it could be Event.Users. Both would result in a valid model. You must give EF a little hint what you want by applying the [InverseProperty] attribute:

public class User
{
    ...
    [InverseProperty("Users")]
    public virtual ICollection<Event> Events { get; set; }
    ...
}
Monday, September 27, 2021
 
steros
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share