To Pyblish or not to Pyblish?

max.pareschi · 27 October 2023 22:26

So, I’m just throwing it up in the air… don’t you feel that pyblish is kinda starting to limit things in the long run? Especially with AYON.

There’s a bunch of magic logic inside there, and whatever abstraction you may want (the dataclass post I saw is the stuff of dreams) still needs to be massaged into dicts to get passed to subsequent steps.
I also feel that the serial nature of the collect and validate steps can and is misused in a bunch of different ways, leading to a lot of edge cases and opinionated workflows (we do use it for our stuff, but it still stinks).

It works pretty well for maya like apps, but Houdini feels a bit stifled. Not sure about gaffer but concept is the same. Nuke too, even if you basically just render stuff there 90% of the times.

Not that I have better ideas, but still wanted to share my thoughts.

munkybutt · 28 October 2023 06:45

I kind of agree but there doesn’t seem to be much of an alternative. I floated the idea on discord that Ynput create their own fork of pyblish as it is as close to unsupported as a repo gets (loads of outstanding PRs and ones that get through take aaaages).

There are lots of changes that I would love to see make it into the codebase but it doesn’t feel worth the effort with the current state of the repo.

Some changes I have made locally at other jobs:

ability to sort plugin order via the UI and save this off for re-use.
added a processing stage before validation and after collection. This would fix any issues that always needed to be fixed in a scene. Failing publication because the scene is missing some metadata that should always be there is endlessly frustrating to the artists I have worked with.
modified the internals of pyblish to fix the way plugins are loaded so they can be run like normal python code rather than the text evaluation that they currently are. This made it possible to properly debug plugins.
turned off the log spam
type annotations!

I think dropping pyblish would be less beneficial than reviving it.

max.pareschi · 28 October 2023 07:48

Well, awesome changes on your end!

The thing I hate the most about it is the rigidity.
Every plugin needs to be disabled or enabled in the system settings, and while you can configure some, you need to change the code to actually customize. Which is ok, but I’m wondering if some sort of a nodal approach on configuration would be better.

Actually a whole production can be described as a dag. The output port of every task could very well hide a subnetwork that represents the publish chain of plug-ins in a nodal form. Or maybe better, you could do rules by family? So you just need to override them on the dag where needed. Makes more sense to me.

Aside from this rant, I agree with you, I just feel it’s kind of blackboxish atm.

munkybutt · 28 October 2023 10:40

Yes it basically needs a new major version taking in all the lessons learned so far.
I suggested having this on the repo and was politely shut down

munkybutt · 28 October 2023 10:42

I actually know of at least one person who has a pyblish alternative in the making - but they haven’t made it public yet.

tokestuartjepsen · 28 October 2023 20:52

I think one of the major issues with Publish plugins is relying solely on an float number ordering of execution.

Very often plugins rely on each other to reduce data collection and code duplication. This leads to rigid ordering of plugins which is often undocumented so a plugin’s dependencies are unknown.

I have suggested in the past for Publish is for plugins to chain together explicitly as a class attribute. You can kinda hack this into the current Publish version by importing plugins into each other and piggy backing the order number but it’s not pretty.

munkybutt · 29 October 2023 05:59

Yeah the pyblish philosophy is that plugins shouldn’t* have interdependency on each other but that is not practical in real world situations.

I brought all this up on the repo and the reception was luke warm at best. They were eventually happy for the changes to be maybe introduced but it took months for my first PR to be accepted and it was super small critical fix. I decided to not do any more as there was a high chance of the PR being left open and unmerged.

BigRoy · 29 October 2023 21:56

I remember when initially Marcus and I sat down to discuss the design for Pyblish when he started building it (partially based on some funding from our end actually). It’s a long time ago. Initially we prototyped the idea with a node based system - since everything around us was node graphs in our work - but felt that it’d hold back on the simplicity of Pyblish design philosophy. It would have e.g. required at least a UI tool to manage plug-in dependencies since it’s non-trivial to describe node graphs by code.

Pyblish define order by `dependencies`

By the way, nothing is actually holding you to set the `order plugins based on their dependencies. Consider e.g. doing this:

Say we define plug-ins like this:

class A(pyblish.api.ContextPlugin):
    order = pyblish.api.CollectorOrder

    def process(self, context):
        context.data["A"] = True
        print("A")

class B(pyblish.api.ContextPlugin):
    order = pyblish.api.CollectorOrder
    dependencies = ["A"]

    def process(self, context):
        assert context.data["A"] is True
        context.data["B"] = True
        print("B")

class C(pyblish.api.ContextPlugin):
    order = pyblish.api.CollectorOrder
    dependencies = ["A", "B"]

    def process(self, context):
        assert context.data["A"] is True
        assert context.data["B"] is True
        context.data["C"] = True
        print("C")

class D(pyblish.api.ContextPlugin):
    order = pyblish.api.CollectorOrder

    def process(self, context):
        context.data["D"] = True
        print("D")

And use a sorting algorithm, e.g. this:

from collections import defaultdict, namedtuple

Results = namedtuple('Results', ['sorted', 'cyclic'])


def topological_sort(dependency_pairs):
    """Sort values subject to dependency constraints"""
    num_heads = defaultdict(int)  # num arrows pointing in
    tails = defaultdict(list)  # list of arrows going out
    heads = []  # unique list of heads in order first seen
    for h, t in dependency_pairs:
        num_heads[t] += 1
        if h in tails:
            tails[h].append(t)
        else:
            tails[h] = [t]
            heads.append(h)

    ordered = [h for h in heads if h not in num_heads]
    for h in ordered:
        for t in tails[h]:
            num_heads[t] -= 1
            if not num_heads[t]:
                ordered.append(t)
    cyclic = [n for n, heads in num_heads.items() if heads]
    return Results(ordered, cyclic)

Then we can register a discovery filter to do some magic.

def shift_order_for_dependencies(plugins):
    offset = 0.0001  # amount of offset applied from dependencies

    def get_dependencies(plugin):
        return getattr(plugin, "dependencies", [])

    plugins_by_name = {plugin.__class__.__name__: plugin for plugin in plugins}
    dependency_pairs = zip(
        name, get_dependencies(plugin)
        for name, plugin in plugins_by_name.items()
    )
    result = topological_sort(dependency_pairs)
    assert not result.cyclic, "Not allowed to have cyclic dependencies"
    
    # Apply offsets in sorted order
    for plugin_name in result.ordered:
        plugin = plugins_by_name[plugin_name]
        for dependency_name in get_dependencies(plugin):
            dependency_plugin = plugins_by_name[dependency_name]
            if dependency_plugin.order > plugin.order:
               # todo: we might want to validate it does not move into another order
               #   e.g. avoid pushing it from Collector to Validator
               plugin.order = dependency_plugin.order + offset

    return plugins

pyblish.api.register_discovery_filter(shift_order_for_dependencies)
# Now publish away

Note that I didn’t test run this code at all - so consider it quick pseudocode.

Visualizing data during publishing

Having said that - I think the bigger issue really is instead more of a visual debugging problem than anything else. It’s too bad it’s just non-trivial to see how values change over time, by what and what uses what data, etc. - Which actually isn’t even that hard to do - look for example at this old prototype I had quickly put together that shows what data was added and changed by a plugin.

Nothing is holding us from swapping out the actual instance.data behavior with something that also adds a notification if any data was accessed - so, that also that could be visualized.

fabiaserra · 9 November 2023 15:58

I can resonate with the same issues this thread is raising. I really like the design of Pyblish and the ideas behind the plugin framework but I have also find it too rigid and “blackboxy” in many situations to develop with it for OpenPype. Probably not so much to Pyblish’s fault but how OP has ended up evolving on top of it and many hands adding band-aids over band-aids on many places (a lot of similarities with the disadvantages of micro-services frameworks). If the plugins were written by reusing core basic API functions that don’t rely on the Pyblish plugin system, that are 99.999% robust and well maintained and documented I think this would be a MUCH smaller problem but a lot of times I have found myself needing to frankenstein multiple parts of the plugins because of how they are all requiring the predecessor plugin framework and being scared of breaking other workflows… and I keep seeing PRs that duplicate existing code over and over as well. Because of this it has been quite of a hassle for me to fit on the framework on a few of the tools that I have written (maybe my lack of a deep understanding of the system) that I have ended up creating my own abstractions and working around Pyblish.

A lot of the issue I think also derives from the problems that @BigRoy raised with the lack of debugging and visualization tools so it’s easier to clean the system and know exactly which data is actually required to be passed along the plugins and not a guesstimate of the order required and trial/error the families/task filtering flow until you get it to work haha

Anyway, enough of me rambling, I will go grab my first coffee

max.pareschi · 31 October 2024 19:36

Now that we have some pros or cons really, what I think it boils down to is:

Dependency system
A lot of the stuff we do in plugins is either enabling or disabling behavior depending on instance data. A lot of things could just be simplified by dependency activation or exclusion (like farm mode). The current system relies a lot on families, which are cool, but juggling family types just to let plugins behave misses the point of having reliable data.
Ayon publish API
If you feel like reinventing the wheel it’s ok to do so, but there should be just one orthodox way of getting data. Like timecode or farm mode.
Debugging tools
If we have dependencies, the graph add on is actually already suited to visualize that data. So you can know which connection goes where. Or maybe just a visualization node tree. Whatever fits really, but the hardest parts to debug right now are the ones enabled by pyblish design, which is multi in multi out.
Plugin standardization
All in all every plugin is just an Uber function. It expects data and spits out data. Every plugin should be required to have a common interface that describes its inputs and outputs. This will also be needed to have clear dependencies

Any thoughts?

mustafa_jafar · 1 November 2024 17:41

Hello,
Let me provide some insights about some current dev progress and suggested ideas.

We have a community post where we mentioned some debugging techniques for pyblish plugins Pyblish Plugins Debugging

Also, as a big fan of @BigRoy’s publish debug stepper.
I’d like to mention it was added as an experimental feature in Ayon Core in this Experimental tool: Pyblish debug stepper #753 | Core Addon

Some of the publish logic live in integrate.py which copies the files via and registers them in the DB.

There’s also a PR that implements an approach to “lower level publishing” which goes through pyblish. I think pyblish is hard to avoid since the whole pipeline is implemented as a set of pyblish plugins.

So, we may have a high-level API interface that goes like

# The system should end up allowing us to do AYON publishing in a simple way WITH type hinting across the board.
context = ayon.CreateContext()
instance = context.create(
    variant="Main",
    product_type="texture",
    files=["/path/to/texture.exr"],
    traits=[
        FrameRange(start=1001, end=1010, frame_list=[1001, 1005]),
        OCIO(path="ocio.config", colorspace="ACEScg"),
        BurninMetadata(camera_name="XYZ", username="roy"),
        TagsMetadata(),
    ]
)
context.publish()

I believe @BigRoy can share better insights.

To Pyblish or not to Pyblish?

Pyblish define order by dependencies

Visualizing data during publishing

Pyblish define order by `dependencies`