I would like to start discussion about handling point cache family in Ayon. There are some similarities between formats like abc/bgeo/usd/… that can be handled by the same logic, but at the same time they require different approach on validators, extraction and even loading.
Putting here two comments that appeared on #4588 so the discussion could proceed here:
I was testing the Houdini publisher when I got confused (again, because I remember coming across this same confusion a couple weeks back) trying to find an Alembic cache family and realizing that it was the Point Cache. However, that got me thinking and made me remember this PR where you were creating a new family and I’m noticing there’s quite semantic divergence on how you have been creating the more recent families and I don’t understand why we are not making use of representations instead of creating new families for each file format.
Wouldn’t it make more sense to have a single Point Cache family that allows us to define which representations we want for it (i.e., Abc, Bgeo, ASS, USD…) so it’s up to the project settings to choose what data formats we want our families to be on? Same applies to Model, Camera… In my prior studio we published the different file formats as components on the same ftrack asset version, so sometimes you’d prefer to import the bgeo representation but some others the alembic cache (same for textures to store .exr and .tx or shaders for .mtlx, .ass…). Of course there will be occasions where there might be some slight differences of what’s stored on the representation due to the missmatch of features but I think there’s some big limitations on the alternative. I wouldn’t want the artists to select every family we choose to support on the project, I would rather them just needing to know they have to publish a “Cache” and in our settings I could choose to not only create a Bgeo cache but perhaps a USD one as well, that way I could slowly migrate to a different format. Does this make any sense or I need some extra coffee?
Wouldn’t it make more sense to have a single Point Cache family that allows us to define which representations we want for it (i.e., Abc, Bgeo, ASS, USD…)
It’s not that trivial due to them actually being separate ROP nodes (and also requiring different validators to trigger in production so the families would need to be dynamic per instance - the new publisher doesnt’ allow yet to have settings in the UI to be unique PER instance I believe but they are based on the (primary?) family. Or could we actually mix/match families in the Creator already @iLLiCiTiT? A bgeo + pointcache instance might have different settings than a just a bgeo instance?) So the complexity of maintaining that increases and the flexibility of the individual nodes somewhat go down I’d say. It’s not impossible however but this would definitely require a custom HDA to make it maintainable.
I’m not entirely sure however about writing say bgeo + alembic + usd family into a single subset. Again, it’s possible - just add the different families into a single instance and it does it. But does OpenPype / Ayon allow changing the family of an existing subset midway, like remove an alembic family and add a bgeo if the next version doesn’t have it. Sounds dangerous if we start wiggling that around and someone starts ticking off and on a specific “output family” for the instance when publishing new versions?
Older iterations of OpenPype / Avalon had families on versions so they were per version, but at some point they were moved to Subsets to make them more static with a reason - to avoid the expected data to be versioned to to suddenly be something different. E.g. latest version 8 not having a USD file, so to what version do you update then when updating the loaded USD? Do you report an error? Take the last existing? etc. Complexity does quickly increase.
Another side note - for e.g. USD you tend to set a different “path” LOPs path instead of SOP path for Alembic, etc. So even setting what you’re outputting could differ?
I’d say let’s make a dedicated issue or discussion about this - I think it’s at least worth investigating the options in that area.
Thanks @BigRoy for the quick response and explanation!! I definitely agree that the complexity would increase but knowing how many times on my prior studio we were able to fix production needs by having the flexibility of easily swapping representations/components of any existing asset version makes me worry whether that’s something that could become a blocker with the current design.
Are the key concepts definitions from the OP docs outdated? I’m a bit confused when you mention “bgeo + alembic + usdfamily into a single subset”. Wouldn’t that be “bgeo + alembic + usdrepresentations into a single subset”? Reading that glossary again (Key concepts | openPYPE) this seems to describe exactly how I had envisioned the design to be, you choose the family, the subset and then the representations of that subset (family > subset > representation). But from what I’m seeing, “representations” aren’t really used here, mostly just for storing a thumbnail of the same subset?
I think that creating custom HDAs might be a requirement to scale the OP integration and make it more Houdini native in the long run (allowing artists to create node networks directly and setting any dependencies for doing publishes instead of needing to go through the UIs, although that would still be possible). On my prior studio we had a system that hijacked the hou.Node python class by inserting it as an inherit into our custom python classes so you could write any python code and you could call it for any HDA or Houdini node directly from the node’s python object (i.e., hou.node("custom_node").custom_function(), hou.node("alembic_node").custom_function()), not needing to write the code within the HDA. We also had custom nodes with very simple purposes (i.e., “OP Publisher”, “OP Cache”) so for TDs (and even the developers to provide new publishers) it made it very easy to build more complex functionality by combining the smaller nodes (think of Bifrost Compounds or Nuke gizmos).
As for your worry about “representations” missing across different versions, the way we did it was very simple, if you try to update to a newer version that doesn’t have the representation you are using, it just errors out saying that the “representation” doesn’t exist. However, that was very uncommon as most artists didn’t mess with the default job settings of which “representation/component” should be published by family. On the other hand, imagine the alternative situation where I might want to have “Bgeo” and “Alembic” caches throughout the entire production. How do you currently keep those separate families “on sync” with this design? If they don’t always publish both, the versions get out of sync and all of a sudden it makes it very hard to know which family “represents” the other family. I think that’s where “representations” should really be used for?
Anyway, I’m still very new to OpenPype and I still need to fight a lot more with the Houdini OP implementation to get all of these concepts wrapped in my brain, I might change my mind again then haha
Definitely agree to start a GH issue/discussion to discuss this further.
Are the key concepts definitions from the OP docs outdated? I’m a bit confused when you mention “bgeo + alembic + usd family into a single subset”. Wouldn’t that be “bgeo + alembic + usd representations into a single subset”? Reading that glossary again (Key concepts | openPYPE) this seems to describe exactly how I had envisioned the design to be, you choose the family, the subset and then the representations of that subset (family > subset > representation). But from what I’m seeing, “representations” aren’t really used here, mostly just for storing a thumbnail of the same subset?
The family is what describes how the publishing behaves, but also how the loaders behave. An Alembic loader might need a different node in Houdini then say a USD loader, a COP2 loader or a BGEO (or file loader) as such the family is what defines what “type” the data is you can act on. That family is stored on the subset, e.g. saying this subset is a “RIG” family, a “POINTCACHE” family (where pointcache has meant alembic throuh most of OpenPype’s lifetime) or a “BGEO” family.
It would be slightly more ‘precise’ than saying an alembic == family because for example in Maya we have a model family and a pointcache family. The model family also produces an Alembic as representation but comes with additional rules for the model (e.g. it must be a single group, must have certain naming conventions matches, must have frozen transforms, must have x, y, z where a pointcache might have different rules).
It’s the family that describes whether a Loader plug-in can load the data - not the representation name, extension or family (even though extensions filtering has been added recently #3847)
On the other hand, imagine the alternative situation where I might want to have “Bgeo” and “Alembic” caches throughout the entire production. How do you currently keep those separate families “on sync” with this design? If they don’t always publish both, the versions get out of sync and all of a sudden it makes it very hard to know which family “represents” the other family. I think that’s where “representations” should really be used for?
In the past I’d given it two options:
They go out of sync, but you track their input dependencies to see what data they were based on. Not as simple as seeing it in the blink of an eye in the loader since they’d have different versions. But technically there’s nothing holding us from identifying what was based on what workfile version (this is already submitted along with the publish) and what input dependencies were loaded in the scene (collected for many of the integrations so far), etc.
Enable Collect Scene Version and the published versions will match the version of your scene name. So skipping the publish for one of the subsets would just not have that particular version. Then publishing again with both, they’d both get the newer version number so one could e.g. have v001, v002, v003 and the other only v001, v003.
is what we’re now using mostly in production, just due to how simple it is for the artist to identify both the version to the workfile version, etc. (Even though the UI does allow tracing further with the data artists often don’t need to debug further than that)
I don’t agree. Loader has intrinsic compatibility with a combination of Family and Representation (changed to extension now I believe) and that is how it should be. Representations are exactly to distinguish between bgeo, abc and whateverFormat of the same version of data.
We already do this in multiple places. you can have geometry as obj, abc and others and have loaders that deal with the completely differently.
I strongly believe we should have the least amount of families possible. And in the case that sparked this topic (bgeo publishing from houdini I believe), it should absolutely be point cache family with bgeo representation, otherwise we’re completely mixing terminology.
Just to add to that, because I suspect the example would come up. We can now have model with abc representation and mov representation. They absolutely don’t hold the same data, but they both show a representation of the given version of a model at a particular point in time. Hence they are both valid representation of the data.
We should think of family in a less technocratic and more of an cataloguing term, while representation is purely for technical differentiation.
True, but we are still hitting one important point here (and I’ve heard it multiple times from clients) - Point cache loader/creator doesn’t really tell you what type of data you are really publishing and loading. Like you just know you’ll be using bgeo in your project, but with creator and loader named pointcache, you’ll never see bgeo anywhere. This is related to family aliases. You can ofc argue with artist training, habit and so on and all of this is valid. But at the same time it is difficult to argue to production used to use bgeo all the time and convince them that this is now bgeo and in maya perhaps just alembic.
Now with extension support in loader this can definitely be united under Point Cache family, specific validations per format can be handled by adding format specific family into families so the primary family isn’t contaminated.
Since we have now unique creator_id, maybe we can make the label dynamic so studio with particular workflow could set the name of the creator to BGEO Cache from Point Cache to make it more accessible for artists.
I know a model in our pipeline adheres to certain standards that a pointcache family doesn’t have to. For models in our pipeline I know for a fact that:
The cbId in the cache are unique. A pointcache doesn’t require that.
The cbId belong to a single asset (and thus the content belongs to a single asset). A pointcache doesn’t require that.
There are certain UV validations + transform validations in place to ensure there’s a specific geometrical consistency for models. Pointcache doesn’t require any of that.
I’d be fine if the resulting output file is in fact also marked as a pointcache but I most definitely want to have it visually clear that it is model certified or adheres to the model rules. Additionally the clarity gained in the loader from it being a model instead of pointcache helps there as well - seeing the family “model” there tells me, this is a clean model for our pipeline ingestion.
To me it’s almost the same as saying a “model” which publishes a .ma file by default too in essence just a “mayaScene” publish. I can expect way more of the output of a Model publish than anything that can be in a maya scene publish. A mayaScene publish != model publish even though they generate the same file representation extension.
Note also that a model publishes a maya scene representation which is not a pointcache format according to the time sampled geometry data definition? How would that hold up?
This same differentation for me exists for “camera” and “pointcache”. Sure they can both hold the same data, but with a camera publish I know for a fact that I’m getting a camera. Again I’m fine with the published file also being marked ‘pointcache’ but even on loading we tend to provide different rules for the loading behavior for cameras (e.g. locking the camera, preserving model panels on update or other camera specific tweaks).
I think you steered this to an extreme example. I’m not saying that we should replace model with pointcache. Just trying to illustrate that considering it’s ok for model to have various file format representations even though they don’t ultimately hold identical data, it should be ok to have pointcache which allows .abc.bgeo or anything else rather than creating bgeo family.
Your post illustrates exactly the point that family is a high level categorization. camera, model and pointcache all can be represented as abc. But family tells me what is their purpose, and indeed as you said, what standards they adhere to.
All two of those families might very well be represented by .bgeo as well.
So to get back to the point of this topic. It should be absolutely fair game to add file formats as representations to point cache family, rather than making new families all the time.
To make that clear though we should also be super clear about what exactly is allowed in point cache and my take on that is, that it should be fairly minimal: Animated geometry data (and yes, a single frame is fine too).
No bones or joints
Caches in general can then be categorized as vdb cache, point cache, hair cache (currently only yeti cache I believe), whatever cache .
Each of those tells you the purpose and how it might have been validated, but a whole bunch could easily be bgeo or abc.
Just want to mention that pointcache doesn’t adhere to these rules currently - and cameras (and I believe bones too?) would export from Maya. We’ve used that in the past too - whenever say we needed an alembic export of that data combined (e.g. tracking data of camera + geo).
If pointcache would by definition not allow cameras it sounds like we might then also be needing a family or alike for one that does?
Sorry I missed the follow up of this conversation, I think I still need to tune my notifications of this forum so I catch the ones I’m interested in! haha
Anyway, what @milan described are exactly the concerns I had when I brought up this topic and totally agree with his points. We should aim at not mixing data “representation” with "family"s!
Examples of what I think are good family names (using the names we used in my prior studio so you see another vision of the same purposes):
Shaders (equivalent of look in OP)
Geometry (equivalent of model in OP but expands to any static geometry data)
Geometry cache (equivalent of point cache in OP)
Scene (this is what we would use on @BigRoy’s example of publishing a combination of things that don’t fit on one of the other families)
And family names that I would consider wrong:
I hope the distinction is clear and why the latter is wrong to scale your pipeline and make it flexible to any data formats.
but with creator and loader named pointcache, you’ll never see bgeo anywhere
@antirotor I don’t get what you mean by this. The loaded “representation” on the scene should always be very visible on the Scene Manager to the artist and they are ultimately always the ones that choose on the loader what “representation” of the “family” to load. In fact, it should be quite a common workflow in a pipeline accommodate artists to change the “representation” to one or another in a lot of the cases (go from Alembic GPU cache to Alembic or to even load it as Standin if you want), that’s really what "representation"s should be for.