Subset and Family .... are they the right names?

milan · 17 January 2023 21:27

I’ve been contemplating for a while whether the naming we’ve inherited from Avalon days holds up after the years of using it. We’ve already made the decision to tweak the naming a little bit as we’re working towards AYON v1.0 release where we renamed Asset to Folder to be more flexible and better aligned with the general perception of the entity name and what it holds.

Instead of having Episodes that are asset entities, in AYON we have a folder entity of episode type (or any other arbitrary type name).

However as we’re extremely close to a first beta release of AYON I want to open a wide discussion to validate two more terms we currently use. Subset and Family. They both served us well, but after speaking to many studios that are consider adopting OpenPype / AYON and going through a lot of training and on-boarding users over the past 4 years I have a suggestion for an alternative that feels like it fits better into terminology used in other pipelines. Now subset is a term I haven’t seen anywhere and family is a Pyblish technical term that stuck.

My suggestion would be considering changing subset to product and family to product type or subset type should the term subset stay. My main reasoning is that we have to spend quite some time explaining what we mean by subset and I almost always end up telling artist that they are technically various publishing products of tasks that make up a shot or and asset.

The product type is just something a lot more natural and better matching the fact that we have task type and folder type, hence I see little benefit to have essentially same type of categorization treated differently in a different context.

Folder → Product → Version → Representation

I very much realize that family is a pyblish built in term and I don’t think we need to change that under the hood. My feeling is, however, that it would be substantially easier to explain the general concepts and on-board artists by telling them:
A Shot is made up of multiple products and each product has a type…
rather than our…
A Shot is made up of multiple subsets and each subsets belongs to a family.

Just to be clear, this is very much a discussion topic, not a hard push from my end by any means, but we have the opportunity to re-evaluate if what we’re currently doing with terminology is for the benefit of the artist using the product, or just something we’re used to.

antirotor · 17 January 2023 21:53

I really like Product, but with the family, I am more inclined to Trait as usually subset has more families, it makes more sense to Product to have more Traits. Those can map directly to underlaying families, but in the future, they can serve more than just “tags”, but can have their own logic and structure. But I’ll do more some brain-hurting

milan · 17 January 2023 22:53

I am more inclined to Trait as usually subset has more families, i

that exactly what I want to avoid. A published subset has only one main family, which is what artist sees and how we refer to it. And it’s getting nasty mixups, publishing instance can have as many families as we like for better plugin targeting, but should always end up an explicit single main family like render model rig and so on. Traits just throw and extra spanner into that and add more complexity rather than remove some. The artist ultimately doesn’t care if modelMain happens to have extraStrageFamily on top of model, because it’s been used during publishing process.

If we’d like to add more metadata for categorization we should again go with something ubiquitously familiar…like tags, which btw are fully functional in AYON already and can be completely arbitrary and even edited post publishing.

milan · 17 January 2023 22:55

Just to be clear… a Product can have any number of Traits if we reaaaally wanted to adopt openAssetIO naming (do we though?), but that is not the same as the user facing main family that’s visible in the loader for example.

antirotor · 17 January 2023 23:03

I don’t really think that Trait is just OpenAssetIO terminology, it is a whole concept in programming. I like it because it actually adds meaning to the data:

You can have an instance with traits Sequence, Reviewable, Colormanaged and you know that you can expect frame range, you know that you can pass it to generate_review() because it has all necessities needed and you know it has colorspace information. IMHO it is much cleaner than passing it along and constantly guessing that you have you need and at the same time it can be used for categorizing (even though I agree that with tags it is much better).

BigRoy · 18 January 2023 11:56

Thanks for bringing up the OpenAssetIO terminology. I had never read that particular page before.

I really like how they describe Entity Data and I think it’s something we should define as well. Basically it says that a “Family” defines certain ‘data’ to be present on the subset (or version, or representation - I suppose). Anyway, I think that’s really really nice that we pre-specificy that e.g. the UDIM trait says we’ll have per-representation udim key which is a str key identifying its UDIM tile number, and maybe even a udim_format if any UDIM spec other than 1001, 1002 like _u0_v1 is still a thing.

That could even mean e.g. that a pointcache can have a framesequence trait saying it MUST have a definition of frameStart, frameEnd, etc.

Note that I’m quickly deviating from what is the ‘general’ type but really describing what TRAITS the particular publish has. I almost feel like we’ll additionally need to have Traits like some sort of metadata about the ‘data’ that’s present on a specific subset/family.

Boy, that makes me really excited!

jakub.jezek · 19 January 2023 12:20

All this also makes me very excited because I can see lots of confusion going on whenever we talk about subset, variants, instance during trainings. On another hand product is something which clicks immediately.

Also to replace family/families in plugins for something like product_type and product_traits (with all the depth of the concept) would be surly more readable even to developers. In fact, we lost so much time just to explain to each other what settings filter families relates to, if it is main family or the families class attribute.

milan · 19 January 2023 15:50

fact, we lost so much time just to explain to each other what settings filter families relates to, if it is main family or the families class attribute.

Amen!

Even though I would say that that there’s no immediate need to replace data['families'] concept in pyblish. You hit the nail on the head exactly that the problem lies in differentiation between data['family'] which we consider the main, user facing categorization at the end, and data['families'], which is pretty much used just for pyblish plugin filtering.

My proposition is to replace just data['family'], or rather promote it to a clearly defined type of a subset/product.

So pyblish wouldn’t need any changes really in the first wave. We’d merely make sure that whatever makes it to family information on a subset at the moment, would be renamed to type and importanly, could be validated against a schema of known subset types, same as you can only use predefined task types.

In the code we could continue to use data['families'] for any plugin filtering (or rename it, but that wasn’t the target of this discussion actually), but we’d have something much nicer on the user facing side.

BigRoy · 17 January 2023 22:24

I personally like family quite a bit, but admittedly I’ve worked with it for a long time. Product Type to me sounds quite a bit more technical and verbose. (Maybe because of its length, since ‘type’ could work). The biggest issue I have with family is that it is Publish terminology but not in all cases is what is the ‘pyblish plugin family’ a 1-to-1 correspondence with the published family.

What I like about the ‘family’ is that it feels both like a type but also like a ‘tag’. That’s its greatest strength and greatest weakness since sometimes you’d like to tag it more and sometimes you want it to be very explicit. *Trait" kind of has it too but maybe also sounds a bit too loose - not so strict. As if a model could just lose that ‘trait’ any moment.

Funnily enough I would have actually considered Product a synonym to Representation. Even though I don’t have any direct issues with representation except for that it’s so long - code often refere to it as repre. However it’s directly clear that whatever we come up with will require onboarding for someone, or heck I’d go as far as saying everyone. I think changing the names actually wouldn’t solve the problem unless the name is really colliding a lot with other terminology and just wouldn’t be remembered whatsoever. So I just wanted to also chime in and say that keeping the names wouldn’t be bad either.

Just to throw in some things as notes:

short is better, better to type, better to display, faster to read (that is as long it is clear enough what it is)
we should avoid ‘near conflicts’ in naming. E.g. just the word type would be too generic as a replacement for family if we also refer to Folder Types a lot.
should be simple words

Also here are some words that come to mind with each of the names. (Note: some words pop up for multiples)

Folder: Asset, Container, Group, Space, Area

Family: Trait, Type, Kind, Class, Category, Identifier, Classification, Tag, Id, Format (too conflicting with what representation somewhat is?), Variant

Subset: Component, Item, Package, Bundle, Container, Creation, Result, Output, Product, Extract, Archive, Publish

Variant: Subcategory, label, name, tag

Representation: Format, Output, Product, File (not good, because it could be a sequence of files) or Component.

Some are really bad. But to me personally some don’t stand out to me as ‘oh yeah that’s definitely so much better’.

“Product” I think could be a suitable replacement for Subset I think. Fits well with the USD render idiom for render product?

I wanted to bring up something else, do we potentially see a way where Subset might also not need to be prefixed in the name with its Family? Or do we still see very valid reason to enforce that? (Or is that actually not the case anymore to begin with anyway?) The prefix sometimes makes it a bit so that the subset is also somewhat of a type/family identifier even though subset naming originally was basically what Variant solved now?

Or is Subset to explicitly be the ‘family+variant’? It would be good that if think of these two so close then it’d be good to find combinations that make sense together, for example Trait+variant to me feels quite weird together.

milan · 17 January 2023 22:33

For now just a quick answer to the subset naming. It is not enforced anywhere at all. familyVariant is just the default setting because it proved quite good, but we have clients that run subset name as {task}_{variant} for example when they have uber linear workflow. All of the subset naming is fully based on templates, just as folder anatomy, so that should not be and issue hopefully.

iLLiCiT · 19 January 2023 15:47

Small notes: Some clients even have hardcoded string inside subset name template. BTW some host creators support custom keys in subset name template (e.g. renderLayer, …).

milan · 17 January 2023 22:48

Regarding the rest of the entity names. Indeed all of them have synonym and some very suitable at that, however, I’ve never had anyone wonder about version and representation names, so I believe those are spot on.

Asset → Folder transition has gone through a lot of discussion and iterations over the past year and that’s a one where I’m really confident we won’t need any explaining at all. Especially considering that AYON make use of full paths as identifier as well as ID, so you can refer to a Folder with episodes/ep101/sh010 for example. Pretty much matching filesystem

Family change suggestion is actually to solve exactly the issues you mention. We’d get rid of the strange overlap between pyblish family used for plugin filtering and user facing family, that’s not always quite the same.

I suggested type because it’s ubiquitously used in many system to categorize something. We already use it across the board and it would fit very well within the system as it’s just another deliminer. Folder[type], Task[type]', Product[type] i believe that it’s simpler and consistent.

It don’t think there’s a huge benefit to either of the changes, but it would make the whole data model a bit more human friendly. Especially the family change. It’s not part of the data model, it’s a string parameter that we happen to use and we don’t even list what families we have anywhere. Changing it to type, might also give us a bit of a push to actually formalize in a similar way as we formalize folder types and task types.

munkybutt · 18 January 2023 11:21

Would it be possible to have these just be variables that each studio can define themselves to match their internal terminology.
Under the hood it can use whatever terminology makes sense for the codebase, but if the issue is about being able to explain during onboarding, letting studios use their own terminology is the most ideal.
For us, SubSet is Variation and Family works just fine.
When it comes to naming conventions there is never a perfect choice so flexibility to modify to fit the environment is most ideal (though obviously the more difficult thing to do)

BigRoy · 18 January 2023 11:32

I’d actually avoid that because it’d just raise even more eyebrows when discussing potential issues with Ayon if everbody speaks a different terminology. I’d say - pick what we together feel is the way forward and then document the hell out of them HERE.

Actually I’d like to take this moment that the Glossary for Key Concepts for artists there is actually missing some key terminology:

Instance:
Variant

I’d even go as far as adding “Dependency” or at least link to a page that describes some more of the technical terminology.
Similarly I think we should document the difference between “Application” and “Host”.
And maybe even add “Context” and “Environment (Variables)” with a brief description for anyone to learn from. And Anatomy. Going forward with Ayon I’d even like to see Addon or alike in the glossary.

Application vs Host
Context
Environment (Variables)
Anatomy
Versioned Settings

milan · 18 January 2023 11:45

That is more than certainly something we need to tackle across the board. There is a very important distinction though that I would like to keep in mind. The glossary should almost be done twice. Once with all the concepts and techy details for TDs and developers and once with limited information and only with whatever terminology is user facing for the artist.

There is absolutely zero value trying to explain to the artist what a publishing instance is. I’d argue they should never even see that in the publisher GUI and only ever refer to it as publishable subset/product whatever. Nevertheless you are right that docs have massive holes on this front.

milan · 18 January 2023 11:47

For us, SubSet is Variation

@munkybutt interesting. So you’d refer to all of these modelMain, modelProxy, rigSimulation, lookWet as just “variations” of an Asset?

We have variant, which is essentialy the second part of the subset name at the moment, but refering to the whole thing as variation must be somewhat confusing

iLLiCiT · 19 January 2023 15:51

Technocratic comment: The “family” is also one of filters for load plugins.

milan · 19 January 2023 16:00

But only because we use it as the main family kind of data. That would essentially change to filter by subset type

BigRoy · 12 April 2023 18:10

@fabiaserra @gabor.marinov I feel like this could be a good topic to get your opinion on too.

milan · 12 April 2023 18:58

Second that. We’ve been mulling this over for a long time, but there isn’t any strong push either way to be honest. At the very least there is no super strong reason to change subset to product (my personal feeling is that it’s a good idea).

Regarding the family → subset type or product type, now that would be very useful to unify.