Enhancing Houdini integration

mustafa_jafar · 28 September 2023 19:08

It has been discussed and mentioned several times that the current Houdini integration implementation does not meet many specifications.

For example:
It doesn’t support procedural publishing, it doesn’t make use of the existent tools like deadline tools which requires re-implementing them (re-inventing the wheel!)

So, let’s define the first principles of publishing and please correct me if I’m wrong:

Pyblish Concept

It’s based on injecting validation scripts between clicking the render button and the action of rendering
These validations should troubleshoots artist work.

The way Pyblish (technically) works is by adding meta data (special attributes) on top of the artist work (let’s call it publish instance), then Pyblish will be able to explore the scene file and grab these publish instances.
Finally, Pyblish will run the validation scripts and if things are good it will run the proper export command associated with each publish instance.

To adopt it in Openpype:
We introduced create , publish menu actions, they behave a little differently for each DCC
In Maya :
create should create a container with meta data (a set)
publish should find all containers (sets) , validate and export them one by one in alphabetical order by their subset name.

For DCCs like Maya it fits well because every thing is straight forward

artists can create sets with the proper meta data themselves (or even using a quick script)
artists can drag and drop new objects in that set to include them in that specific publish
they will click OpenPype > Publish , instead of selecting the set and clicking File > Export Selection...

With DCCs like Houdini, There are always many ways to achieve everything due to its nodes system and procedural nature.
So, it’s quite limiting when trying to adopt Pyblish in Houdini in the way it’s adopted in Maya

And let’s not forget the post-publish Openpype/Ayon functions.

Houdini Publish instances

So, what a Houdini publish instance should be ?

Should it be a ROP node (as the current implementation) ?
or
Should it be an sop output node / rop node? that can be export by ROPs or TOPs ?

Output sop nodes can be amazing but we can’t use them for cameras for example !

output SopNode	rop RopNode	fetch RopNode	wedge RopNode	subnet RopNode

rop SopNode	rop TopNode	fetch TopNode	wedge TopNode	subnet TopNode

Publish system

Note that publishing depends on how we define publish instances!

How the publishing in Houdini should work ?

Should a separate plugin (a publisher for example) grab all the publish instances and publish them one by one (the current implementation)?
or
Should each publish instance publish itself without invoking the publisher? like a series of nodes (render, publish)
or
Should both be supported ?

I need to point out that each publish instance has a pre-process → (validations) and post-process → (extract, integrate).
so, in order to use the vanilla houdini ROP nodes, we encapsulate it with a pre-process and post-process, we can achieve that by

pre and post nodes	a single input/output wrapper node (subnet)
image.png682×571 23.8 KB	image.png1760×686 54 KB

Publishing to Farm

Let’s take deadline addon as an example,

Ayon has its own deadline addon where we sent two jobs, render job and publish job

For other DCCs that’s a fantastic feature.
but for Houdini, it looks like we are duplicating an existing features that already works well!

So, how would we reuse them ?
I think it would work if we could

fake that the job was submitted from Ayon/Openpype (this happens when adding environment variables to your submittion)
have a publish node

Here’s my a demo test:

Steps I made: 1. save a hip file on shared storage 2. add a sphere and a geo rop node 3. connect it to a deadline otl 4. submit a job
for the first glance it fails, but as soon as I added essential environment variables, it worked as it were submitted from Ayon/Openpype. (I used deadline monitor to add these vars, and I don't know how to do that from the deadline otl)	image.png2019×327 32 KB
here's a proof from the log that it worked the same as any job published from Ayon/Openpype you can find that `GlobalJobPreload` is triggered!	image.png1527×670 94.1 KB
Note that this is my houdini deadline configuration! and no way houdini will run without running `GlobalJobPreload`	image.png805×305 5.02 KB

The current implementation

Each Publish instance is a ROP node whose instance.data includes
- Frame Range
- Output Node
- Output File
- ROP specific parameters (e.g., path parameter in alembic ROPs)
- extra attributes
  - id
  - family (product-type)
  - subset
  - active
  - creator_identifier
  - variant
  - asset
  - task
  - instance_node
  - instance_id
  - creator_attributes
  - publish_attributes
Publish System
- Publisher grab all ROP nodes
- Run validations and other things on publish instance.data
- Do some OpenPype/Ayon related operations

fabiaserra · 30 September 2023 22:25

Thank you for starting this document. Let me try first give you a collection of some of my thoughts and then I will respond to the questions you raised in another reply.

As I said in another post, my practical experience with OP in Houdini is still quite recent and minimal and I might be missing some of the functionalities that exist… so check me if I say anything wrong.

From my experience the main problem that I see with OP in Houdini is that it’s too opinionated on how artists need to work and the outputs that it supports and it’s very different to the normal node-based workflow they are used to (Houdini artists don’t want and shouldn’t need to open a separate UI to run a cache/extract/publish, load “subsets” or manage the loaded versions, they should be able to do all of that through the parameter interfaces of the nodes). Houdini already provides multiple ways for writing stuff out to disk either locally or in the farm through vanilla farm orchestrators (like I described here https://github.com/ynput/OpenPype/pull/5621#issuecomment-1732166830) and most artists and studios will have a bunch of utility tools for caching, rendering and their own very particular ways of working, with parameter presets on the vanilla Houdini nodes or HDAs that export custom data, that’s the beauty of Houdini. However, if you want to publish any of those outputs to the OP database right now you are almost enforced to stick to its Creator nodes, which until very recently you could only create through the Creator UI and not like a normal node creation in Houdini. Thankfully now we can also create the creator nodes directly through the tab search but even then, artists don’t remember they have to create a different node and those nodes are too rigid with hard-coded parameters and it doesn’t cover all the types of outputs you’d want to be able to publish.

The first MVP OP integration for Houdini should just provide a means to be able to publish any of the generated outputs to the database so other downstream artists can consume it. OP shouldn’t have an opinion whether the family has to be called a “mantra_rop”, “arnold_rop”, or any of the hard-coded family names, most of the outputs are simply either a single file or a sequence and the studio should be free to choose how they want to name that output in the database, which metadata to include and what “family” to put it under (“render”, “geometry”, “base_model”, “geocache”, “reference”, “contact_sheet”, “aov”, “image_sequence”, “groom” or “potato”), OP’s barebones shouldn’t have an opinion on what validations you need to run for these. That thin integration would be very easy to adapt to most studios and any Houdini TD would be able to understand and build upon. I could have my own extractors that create textures, grooms, curves, volumes, models, geometry caches, terrains… and I would simply use the OP publisher as a way to register those to the database so they are version controlled and enforce the studio’s naming conventions (although even this shouldn’t be a stop gap, a studio could trust their artists common sense and not enforce a convention). What’s clear is that we don’t need to enforce any naming conventions or workflow during the creation/extraction phase, you only apply those when you register that data in the publish.

The most barebones integration could start with these three nodes:

Generic ROP Publish (OP/Ayon Publisher)
Given a path (or multiple for being able to write multiple representations) to disk, a frame range (or whether it’s a single file), a subset name, a family and the destination context… when the node gets cooked it simply copies (or creates a hardlink) of that data to the publish directory and registers it to the database.

SOP Cache node (OP/Ayon Cacher)
Basically a node like the one @MichaelBlackbourn described here AYON Houdini Universal Cache Node - #2 by MichaelBlackbourn which internally would simply contain a ROP network that references that Generic ROP Publish with some channel references and defaults on how to publish the generated SOP output.

SOP Loader node (OP/Ayon Loader)
Another utility SOP HDA node that has context parameter dropdown to choose the current “asset” (defaulting to the current one on the session), the “family” (it could be filtering the “families” that are “SOP” supported and can be loaded with the nodes contained in this HDA), the “subset” name, the “version” and the “representation”. The OP scene manager could find all of the “loader” nodes and list the versions loaded so you can control the versions from the scene manager but you would be able to just change the version directly from the node parm interface version dropdown.

Once you have that barebones layer, you can then start to assemble those low level building blocks to streamline workflows, run validations, set naming conventions on how and where data is written and reduce user error (i.e., example nodes: “Render publisher”, “Lookdev Publisher”, “Geometry publisher”, “Groom loader”…) but that should be the first layer that OP provides. Studios can then choose which layer of integration they want to use, whether they just use those barebone nodes in their custom ones or they also adopt the OP’s utility nodes (and the community would very easily provide new ones).

At the end of the day, this kind of boils down again to the same problem I have been raising for a bit of time, I think the OpenPype API fails to provide the basic building blocks for developers in an elegant way to expand and make use of without writing a lot of boilerplate code and needing to fit it on the pyblish plugin mechanism. Take a look at ftrack or Shotgrid python APIs for example and how simple it is to build on top of. It’s irrelevant whether you are using that API in the farm or locally or what kind of family you are writing, the API is the same, it’s quite intuitive and it’s not enforcing you to run any validations or extractors before you are able to publish any data. You can query any type of entity published on their databases and apply very simple functions to those entities: (1) you can create/query any entity and very simply add new children entities to it (i.e., components/representations to a version), (2) you can add new metadata or modify attributes on any of the entities, (3) you can set dependencies between entities… all of those things in a very intuitive object oriented fashion. On top of that you have the ftrack integrations or ShotGrid toolkit that use those APIs to provide UIs, utility tools and other things to streamline workflows but the core API is simple and flexible.

Now try to map those same features and how you can do them in OP’s. I think there’s a decent module to query entities and show the data that’s been published but when it comes to creating new entities or manipulating existing ones, I think it fails to provide those in a simple manner, making it very rigid and hard to provide flexible workflows. The pyblish plugin framework to collect, validate, extract and publish is very useful in a lot of scenarios, but the lower level API backend that those are built on should be easy to use and should allow you to run the “publish” directly without all of the other steps. I should be able with a few lines of code to provide a publisher to OP from any tool (i.e., publish a render by exposing a simple action in Deadline or directly from RV after reading a render sequence).

For example, look at the TrayPublisher and this discussion we had about it here https://github.com/ynput/OpenPype/pull/5195#issuecomment-1612100673. You’d expect that tool to be one of the thinnest layers of interaction with the OP database where it provides just a utility UI on top of the core OP functions, where given some existing data and some inputs, it runs a publish. And with that assumption, you’d expect you’d be able to reuse most of the same internal code it uses and be able to run the same thing in the farm or use the APIs it uses to do the same publishes elsewhere… but the reality could not be further from the truth, there’s so much code and validations specific to the TrayPublisher that most of its code is useless outside of it and it takes quite a bit of time to understand how it’s all put together. It’s a great tool for simple one-off publishes but it’s very cumbersome and manual if you want to do multiple publishes at the same time or calling it programmatically.

fabiaserra · 30 September 2023 22:54

Houdini Publish instances

So, what a Houdini publish instance should be ?

On its simplest lower level there should be a ROP node that has the following parameters:

Context chooser that allows you to choose the destination “asset” where it will be published. For shows that have episode and sequence you’d expect a dropdown for each to be able to filter out the children entities further (if you have a sequence selected you only see the episodes of that sequence) but you should be able to publish at any entity level (publish to a sequence entity or even show level)
Task chooser
Family chooser
Houdini vanilla parm frame/range selector where you can choose to use current frame or set a specific frame range
Subset name (which could enforce certain validations so you can’t type whitespaces for example)
Multi-parm that contains two string parms, one to set name of the representation and the other the path pointing to where that representation lives on disk.

When you click the render/publish of that node (or the node gets rendered, either locally or in the farm) it basically just calls the OP API to create the version and copy/hardlink the source data to the publish directory.

That same ROP node you can then wrap it on other utility ROP nodes (Geometry publisher, Geometry Cache publisher, Render Publisher…) that basically just set defaults on the underlying ROP node so it’s easier to reuse for artists.

Publish system

How the publishing in Houdini should work ?
Should a separate plugin (a publisher for example) grab all the publish instances and publish them one by one (the current implementation)?
or
Should each publish instance publish itself without invoking the publisher? like a series of nodes (render, publish)
or
Should both be supported ?

My prior comment kind of answers this but you shouldn’t need to collect the publish instances and orchestrate the evaluation of these in OP. The normal Houdini graph evaluation would handle that for you, you just provide code that when the Publish node gets executed, it runs the publish.

need to point out that each publish instance has a pre-process → (validations) and post-process → (extract, integrate).

The validations shouldn’t be necessary or enforced. You could provide validations when you start to provide custom extractors or wrappers on top of vanilla nodes so it runs a pre/post process on the render function of the node but we shouldn’t worry too much about this as a first MVP.

Publishing to Farm

Using the vanilla submitters/schedulers that come with Houdini for Deadline, Tractor, HQueue… you can already do most of the heavy lifting and it has all the logic you’d want for setting dependencies between tasks. We don’t want to repeat that in OP and abstract it, just rely on those doing its job. In order to run the OP publish in the farm you just need to make that when the .render() function of your HDA node gets called it runs the OP publish functions.

for the first glance it fails, but as soon as I added essential environment
variables, it worked as it were submitted from Ayon/Openpype. (I used
deadline monitor to add these vars, and I don’t know how to do that from the deadline otl)

Your test simply failed because you are relying on the OP’s $PATH being set on the GlobalJobPreLoad so your executable is simply “hython” but that’s not necessary at all or required for a vanilla configuration of Deadline + Houdini.

However, if you do want the OP environment variables to exist on the job running in the farm (so you can use $AVALON_WORKDIR or other things managed by your OP settings), you can set those environment variables on the submitted Deadline job by modifying the SubmitHoudiniToDeadlineFunctions.py:SubmitRenderJob of the Deadline repository and injecting these as:

                    # We do this so the GlobalJobPreLoad.py from Deadline injects the OP environment
                    # to the job and it picks up all the correct environment variables
                    fileHandle.write("EnvironmentKeyValue0=OPENPYPE_RENDER_JOB=1\n")
                    fileHandle.write("EnvironmentKeyValue1=AVALON_PROJECT={}\n".format(
                        os.getenv("AVALON_PROJECT"))
                    )
                    fileHandle.write("EnvironmentKeyValue2=AVALON_ASSET={}\n".format(
                        os.getenv("AVALON_ASSET"))
                    )
                    fileHandle.write("EnvironmentKeyValue3=AVALON_TASK={}\n".format(
                        os.getenv("AVALON_TASK"))
                    )
                    fileHandle.write("EnvironmentKeyValue4=AVALON_APP_NAME={}\n".format(
                        os.getenv("AVALON_APP_NAME"))
                    )
                    fileHandle.write("EnvironmentKeyValue5=OPENPYPE_VERSION={}\n".format(
                        os.getenv("OPENPYPE_VERSION"))
                    )

fabiaserra · 1 October 2023 01:33

Thinking about this more in practical terms without a big refactor of the code and creating more APIs… I think that if we made the publishers allow for the “use existing” workflow that we have in Nuke, we could maybe close a bit the gap we currently have where there’s a discrepancy between using the vanilla Deadline submitter to render/generate outputs procedurally and the OP publisher. With the “use existing” we would allow artists to work the normal way in Deadline and then once they have checked the outputs, they would open their scene again and run the OP publisher, which would simply validate that the outputs of the OP nodes exist and publishes them. That publish process shouldn’t take that long and could even run locally as it’s basically validating and integrating the files but the same code we wrote for Nuke to “use existing frames (farm)” would work the same way here. It’s not a great workflow because ideally we would probably want to be able to also run the publish on the same submit to Deadline in certain cases and what I described with the ROP Publish node would give the OP Houdini a lot of power (think of Solaris being a very native implementation of USD API but for OP’s API) but it’s not too bad of a compromise as a starting point?

I think the current loaders and scene management in Houdini (while not a very “Houdini-esk” workflow) it’s pretty good so I don’t think we need that SOP Loader node but it would be great to provide a good SOP Cacher node as that’s the bread and butter of a lot of Houdini workflows.

MichaelBlackbourn · 1 October 2023 06:12

Fabiaserra. I think I should run a demo of our current pipe, for you or a few people. It works very much how I think things could work. :). Our main problem is just that our stuff is based on ideas from about 5-7 years old and I really want to swap out the core of it with something more accessible.

mustafa_jafar · 3 October 2023 16:42

Man, that’s awesome, I added our thoughts in my note.

I have a question :

you can set those environment variables on the submitted Deadline job by modifying the SubmitHoudiniToDeadlineFunctions.py

are there alternative ways to do so ?
Because, we don’t want to maintain deadline Houdini plugin
I thought about generating a tmp json file and make GlobalJobPreLoad.py read it and print useful message in deadline job log

for reference: I made two posts, one for the cache node and another for the load node

neshamota · 12 October 2023 19:42

Hi, great article!
For our kind of work will be cool that pipeline could be modular. For example, we use Houdini for all the character work, and KineFX has it’s own loaders/ROPs for these - FBX character import/ROP, FBX animation, etc…
Would be great if we could hook these to AYON’s loaders/publishers.