Asked  6 Months ago    Answers:  5   Viewed   428 times

I'm working on a multi-tenanted application in which some users can define their own data fields (via the admin) to collect additional data in forms and report on the data. The latter bit makes JSONField not a great option, so instead I have the following solution:

class CustomDataField(models.Model):
    """
    Abstract specification for arbitrary data fields.
    Not used for holding data itself, but metadata about the fields.
    """
    site = models.ForeignKey(Site, default=settings.SITE_ID)
    name = models.CharField(max_length=64)

    class Meta:
        abstract = True

class CustomDataValue(models.Model):
    """
    Abstract specification for arbitrary data.
    """
    value = models.CharField(max_length=1024)

    class Meta:
        abstract = True

Note how CustomDataField has a ForeignKey to Site - each Site will have a different set of custom data fields, but use the same database. Then the various concrete data fields can be defined as:

class UserCustomDataField(CustomDataField):
    pass

class UserCustomDataValue(CustomDataValue):
    custom_field = models.ForeignKey(UserCustomDataField)
    user = models.ForeignKey(User, related_name='custom_data')

    class Meta:
        unique_together=(('user','custom_field'),)

This leads to the following use:

custom_field = UserCustomDataField.objects.create(name='zodiac', site=my_site) #probably created in the admin
user = User.objects.create(username='foo')
user_sign = UserCustomDataValue(custom_field=custom_field, user=user, data='Libra')
user.custom_data.add(user_sign) #actually, what does this even do?

But this feels very clunky, particularly with the need to manually create the related data and associate it with the concrete model. Is there a better approach?

Options that have been pre-emptively discarded:

  • Custom SQL to modify tables on-the-fly. Partly because this won't scale and partly because it's too much of a hack.
  • Schema-less solutions like NoSQL. I have nothing against them, but they're still not a good fit. Ultimately this data is typed, and the possibility exists of using a third-party reporting application.
  • JSONField, as listed above, as it's not going to work well with queries.

 Answers

89

As of today, there are four available approaches, two of them requiring a certain storage backend:

  1. Django-eav (the original package is no longer mantained but has some thriving forks)

    This solution is based on Entity Attribute Value data model, essentially, it uses several tables to store dynamic attributes of objects. Great parts about this solution is that it:

    • uses several pure and simple Django models to represent dynamic fields, which makes it simple to understand and database-agnostic;
    • allows you to effectively attach/detach dynamic attribute storage to Django model with simple commands like:

      eav.unregister(Encounter)
      eav.register(Patient)
      
    • Nicely integrates with Django admin;

    • At the same time being really powerful.

    Downsides:

    • Not very efficient. This is more of a criticism of the EAV pattern itself, which requires manually merging the data from a column format to a set of key-value pairs in the model.
    • Harder to maintain. Maintaining data integrity requires a multi-column unique key constraint, which may be inefficient on some databases.
    • You will need to select one of the forks, since the official package is no longer maintained and there is no clear leader.

    The usage is pretty straightforward:

    import eav
    from app.models import Patient, Encounter
    
    eav.register(Encounter)
    eav.register(Patient)
    Attribute.objects.create(name='age', datatype=Attribute.TYPE_INT)
    Attribute.objects.create(name='height', datatype=Attribute.TYPE_FLOAT)
    Attribute.objects.create(name='weight', datatype=Attribute.TYPE_FLOAT)
    Attribute.objects.create(name='city', datatype=Attribute.TYPE_TEXT)
    Attribute.objects.create(name='country', datatype=Attribute.TYPE_TEXT)
    
    self.yes = EnumValue.objects.create(value='yes')
    self.no = EnumValue.objects.create(value='no')
    self.unkown = EnumValue.objects.create(value='unkown')
    ynu = EnumGroup.objects.create(name='Yes / No / Unknown')
    ynu.enums.add(self.yes)
    ynu.enums.add(self.no)
    ynu.enums.add(self.unkown)
    
    Attribute.objects.create(name='fever', datatype=Attribute.TYPE_ENUM,
                                           enum_group=ynu)
    
    # When you register a model within EAV,
    # you can access all of EAV attributes:
    
    Patient.objects.create(name='Bob', eav__age=12,
                               eav__fever=no, eav__city='New York',
                               eav__country='USA')
    # You can filter queries based on their EAV fields:
    
    query1 = Patient.objects.filter(Q(eav__city__contains='Y'))
    query2 = Q(eav__city__contains='Y') |  Q(eav__fever=no)
    
  2. Hstore, JSON or JSONB fields in PostgreSQL

    PostgreSQL supports several more complex data types. Most are supported via third-party packages, but in recent years Django has adopted them into django.contrib.postgres.fields.

    HStoreField:

    Django-hstore was originally a third-party package, but Django 1.8 added HStoreField as a built-in, along with several other PostgreSQL-supported field types.

    This approach is good in a sense that it lets you have the best of both worlds: dynamic fields and relational database. However, hstore is not ideal performance-wise, especially if you are going to end up storing thousands of items in one field. It also only supports strings for values.

    #app/models.py
    from django.contrib.postgres.fields import HStoreField
    class Something(models.Model):
        name = models.CharField(max_length=32)
        data = models.HStoreField(db_index=True)
    

    In Django's shell you can use it like this:

    >>> instance = Something.objects.create(
                     name='something',
                     data={'a': '1', 'b': '2'}
               )
    >>> instance.data['a']
    '1'        
    >>> empty = Something.objects.create(name='empty')
    >>> empty.data
    {}
    >>> empty.data['a'] = '1'
    >>> empty.save()
    >>> Something.objects.get(name='something').data['a']
    '1'
    

    You can issue indexed queries against hstore fields:

    # equivalence
    Something.objects.filter(data={'a': '1', 'b': '2'})
    
    # subset by key/value mapping
    Something.objects.filter(data__a='1')
    
    # subset by list of keys
    Something.objects.filter(data__has_keys=['a', 'b'])
    
    # subset by single key
    Something.objects.filter(data__has_key='a')    
    

    JSONField:

    JSON/JSONB fields support any JSON-encodable data type, not just key/value pairs, but also tend to be faster and (for JSONB) more compact than Hstore. Several packages implement JSON/JSONB fields including django-pgfields, but as of Django 1.9, JSONField is a built-in using JSONB for storage. JSONField is similar to HStoreField, and may perform better with large dictionaries. It also supports types other than strings, such as integers, booleans and nested dictionaries.

    #app/models.py
    from django.contrib.postgres.fields import JSONField
    class Something(models.Model):
        name = models.CharField(max_length=32)
        data = JSONField(db_index=True)
    

    Creating in the shell:

    >>> instance = Something.objects.create(
                     name='something',
                     data={'a': 1, 'b': 2, 'nested': {'c':3}}
               )
    

    Indexed queries are nearly identical to HStoreField, except nesting is possible. Complex indexes may require manually creation (or a scripted migration).

    >>> Something.objects.filter(data__a=1)
    >>> Something.objects.filter(data__nested__c=3)
    >>> Something.objects.filter(data__has_key='a')
    
  3. Django MongoDB

    Or other NoSQL Django adaptations -- with them you can have fully dynamic models.

    NoSQL Django libraries are great, but keep in mind that they are not 100% the Django-compatible, for example, to migrate to Django-nonrel from standard Django you will need to replace ManyToMany with ListField among other things.

    Checkout this Django MongoDB example:

    from djangotoolbox.fields import DictField
    
    class Image(models.Model):
        exif = DictField()
    ...
    
    >>> image = Image.objects.create(exif=get_exif_data(...))
    >>> image.exif
    {u'camera_model' : 'Spamcams 4242', 'exposure_time' : 0.3, ...}
    

    You can even create embedded lists of any Django models:

    class Container(models.Model):
        stuff = ListField(EmbeddedModelField())
    
    class FooModel(models.Model):
        foo = models.IntegerField()
    
    class BarModel(models.Model):
        bar = models.CharField()
    ...
    
    >>> Container.objects.create(
        stuff=[FooModel(foo=42), BarModel(bar='spam')]
    )
    
  4. Django-mutant: Dynamic models based on syncdb and South-hooks

    Django-mutant implements fully dynamic Foreign Key and m2m fields. And is inspired by incredible but somewhat hackish solutions by Will Hardy and Michael Hall.

    All of these are based on Django South hooks, which, according to Will Hardy's talk at DjangoCon 2011 (watch it!) are nevertheless robust and tested in production (relevant source code).

    First to implement this was Michael Hall.

    Yes, this is magic, with these approaches you can achieve fully dynamic Django apps, models and fields with any relational database backend. But at what cost? Will stability of application suffer upon heavy use? These are the questions to be considered. You need to be sure to maintain a proper lock in order to allow simultaneous database altering requests.

    If you are using Michael Halls lib, your code will look like this:

    from dynamo import models
    
    test_app, created = models.DynamicApp.objects.get_or_create(
                          name='dynamo'
                        )
    test, created = models.DynamicModel.objects.get_or_create(
                      name='Test',
                      verbose_name='Test Model',
                      app=test_app
                   )
    foo, created = models.DynamicModelField.objects.get_or_create(
                      name = 'foo',
                      verbose_name = 'Foo Field',
                      model = test,
                      field_type = 'dynamiccharfield',
                      null = True,
                      blank = True,
                      unique = False,
                      help_text = 'Test field for Foo',
                   )
    bar, created = models.DynamicModelField.objects.get_or_create(
                      name = 'bar',
                      verbose_name = 'Bar Field',
                      model = test,
                      field_type = 'dynamicintegerfield',
                      null = True,
                      blank = True,
                      unique = False,
                      help_text = 'Test field for Bar',
                   )
    
Tuesday, June 1, 2021
 
MassiveAttack
answered 6 Months ago
16

You have two foreign keys to User. Django automatically creates a reverse relation from User back to GameClaim, which is usually gameclaim_set. However, because you have two FKs, you would have two gameclaim_set attributes, which is obviously impossible. So you need to tell Django what name to use for the reverse relation.

Use the related_name attribute in the FK definition. e.g.

class GameClaim(models.Model):
    target = models.ForeignKey(User, related_name='gameclaim_targets')
    claimer = models.ForeignKey(User, related_name='gameclaim_users')
    isAccepted = models.BooleanField()
Tuesday, June 15, 2021
 
waylaidwanderer
answered 6 Months ago
86
  1. I think it is an opinion where to call web services. I would say don't pollute your models because it means you probably need instances of those models to call these web services. That might not make any sense. Your other choice there is to make things @classmethod on the models, which is not very clean design I would argue.

    Calling from the view is probably more natural if accessing the view itself is what triggers the web service call. Is it? You said that you need to keep things in sync, which points to a possible need for background processing. At that point, you can still use views if your background processes issue http requests, but that's often not the best design. If anything, you would probably want your own REST API for this, which necessitates separating the code from your average web site view.

    My opinion is these calls should be placed in modules and classes specifically encapsulated for your remote calls and processing. This makes things flexible (background jobs, signals, etc.) and it is also easier to unit test. You can trigger calling this code in the views or elsewhere, but the logic itself should be separate from both the views and the models to decouple things nicely.

    You should imagine that this logic should exist on its own if there was no Django around it, then build other pieces that connect that logic to Django (ex: syncing the models). In other words, keep things atomic.

  2. Yes, same reasons as above, especially flexibility. Is there any reason not to?

  3. Yes, simply create the equivalent of an interface. Have each class map to the interface. If the fields are the same and you are lazy, in python you can just dump the fields you need as dicts to the constructor (using **kwargs) and be done with it, or rename the keys using some convetion you can process. I usually build some sort of simple data mapper class for this and process the django or rest models in a list comprehension, but no need if things match up as I mentioned.

    Another related option to the above is you can dump things into a common structure in a cache such as Redis or Memcache. It might be wise to atomically update this info if you are concerned with "freshness." But in general you should have a single source of authority that can tell you what is actually fresh. In sync situations, I think it's better to pick one or the other to keep things predictable and clear though.

One last thing that might influence your design is that by definition, keeping things in sync is a difficult process. Syncs tend to be very prone to failure, so you should have some sort of durable mechanism such as a task queue or job system for retries. Always assume when calling a remote REST API that calls can fail for crazy reasons such as network hicups. Also keep in mind transactions and transactional behavior when syncing. Since these are important, it points again to the fact that if you put all this logic in a view directly, you will probably run into trouble reusing it in the background without abstracting things a bit anyway.

Monday, November 1, 2021
 
Autonomous
answered 1 Month ago
15

validate_unique is a Model method.

Running the superclass clean method should take care of model uniqueness checks given a ModelForm.

class MyModelForm(forms.ModelForm):    
    def clean(self):
        cleaned_data = super(MyModelForm, self).clean()
        # additional cleaning here
        return cleaned_data

There is a warning on the django docs specifically about overriding clean on ModelForms, which automatically does several model validation steps.

Monday, November 1, 2021
 
Twistar
answered 1 Month ago
34

I just instantiate a model object from the json data and call full_clean() on the model to validate: https://docs.djangoproject.com/en/dev/ref/models/instances/#django.db.models.Model.full_clean

m = myModel(**jsondata)
m.full_clean()
Sunday, November 14, 2021
 
Martin Vseticka
answered 2 Weeks ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share