Enroll endpoint error

Hello,

Our ftrack processor running on ASH stopped working with a “Connection timed out” error. After some investigation with the help of service_tools, we were able to identify that the /api/enrol end point is not working.

I looked at our server log and found the following error.

2024-10-17 12:20:07 ERROR      anonymous       [POST] /enroll: UNHANDLED EXCEPTION
2024-10-17 12:20:07 ERROR      server          cannot call Transaction.__aexit__(): the underlying connection is closed
[2024-10-17 12:20:07 +1000] [13] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/backend/ayon_server/lib/postgres.py", line 150, in iterate
    async for record in statement.cursor(*args):
  File "/usr/local/lib/python3.11/site-packages/asyncpg/cursor.py", line 234, in __anext__
    buffer = await self._bind_exec(self._prefetch, self._timeout)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asyncpg/cursor.py", line 129, in _bind_exec
    buffer, _, self._exhausted = await protocol.bind_execute(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "asyncpg/protocol/protocol.pyx", line 207, in bind_execute
asyncpg.exceptions.ConnectionDoesNotExistError: connection was closed in the middle of operation

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/backend/api/events/enroll.py", line 107, in enroll
    res = await enroll_job(
          ^^^^^^^^^^^^^^^^^
  File "/backend/ayon_server/events/enroll.py", line 87, in enroll_job
    async for row in Postgres.iterate(
  File "/backend/ayon_server/lib/postgres.py", line 148, in iterate
    async with connection.transaction():
  File "/usr/local/lib/python3.11/site-packages/asyncpg/transaction.py", line 72, in __aexit__
    self._check_conn_validity('__aexit__')
  File "/usr/local/lib/python3.11/site-packages/asyncpg/connresource.py", line 41, in _check_conn_validity
    raise exceptions.InterfaceError(
asyncpg.exceptions._base.InterfaceError: cannot call Transaction.__aexit__(): the underlying connection is closed

and

2024-10-17 12:20:07 ERROR      server          Exception!

    Traceback (most recent call last):
      File "/backend/ayon_server/api/messaging.py", line 132, in run
        raw_message = await self.pubsub.get_message(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py", line 1045, in get_message
        response = await self.parse_response(block=(timeout is None), timeout=timeout)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py", line 918, in parse_response
        response = await self._execute(
                   ^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py", line 898, in _execute
        return await conn.retry.call_with_retry(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/redis/asyncio/retry.py", line 62, in call_with_retry
        await fail(error)
      File "/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py", line 887, in _disconnect_raise_connect
        raise error
      File "/usr/local/lib/python3.11/site-packages/redis/asyncio/retry.py", line 59, in call_with_retry
        return await do()
               ^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py", line 534, in read_response
        response = await self._parser.read_response(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/redis/_parsers/resp2.py", line 82, in read_response
        response = await self._read_response(disable_decoding=disable_decoding)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/redis/_parsers/resp2.py", line 90, in _read_response
        raw = await self._readline()
              ^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/redis/_parsers/base.py", line 221, in _readline
        raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
    redis.exceptions.ConnectionError: Connection closed by server.

We would much appreciate if anyone can provide us with some hint of possible cause to this issue.

We are using AYON 1.4.2+202409231943.

Regards,
Tu

Hello,

I think the issue might be in the postgres db.

This query from enroll_job function in backend/ayon_server/events/enroll.py is taking forever to execute.

query = """
 WITH excluded_events AS (
	SELECT depends_on
	FROM events
	WHERE topic = 'ftrack.proc'
	AND (
		status = 'finished'
	OR (status = 'failed' AND retries > 3)
	)
 ),
 source_events AS (
	SELECT *
	FROM events
	WHERE topic LIKE 'ftrack.leech'
	AND status = 'finished'
	AND id NOT IN (SELECT depends_on FROM excluded_events)
 )
 SELECT
	source_events.id AS source_id,
	target_events.status AS target_status,
	target_events.sender AS target_sender,
	target_events.retries AS target_retries,
	target_events.hash AS target_hash,
	target_events.retries AS target_retries,
	target_events.id AS target_id
 FROM
	source_events
	LEFT JOIN events AS target_events
	ON
	target_events.depends_on = source_events.id
	AND target_events.topic = 'ftrack.proc'
 WHERE
	TRUE
 ORDER BY
	source_events.created_at ASC
	LIMIT 1000
"""

I’m not sure if this is recommended. However, clearning the event table of the ftrack.proc and ftrack.leecher topic fixed this issue

"""
    DELETE FROM events
    WHERE topic IN ('ftrack.proc', 'ftrack.leech')
"""

Hello. I’m optimizing the enroll endpoint to prevent this. Clearing old events is indeed a good option - you can do that automatically by setting
AYON_EVENT_RETENTION_DAYS environment variable of the docker container- a sane value seems to be 30 - 90 depending on the traffic.

As project related activities are materialized in the activity feed, it is not strictly necessary to keep events indefinitely:

    environment:
      - "AYON_EVENT_RETENTION_DAYS=60"
1 Like

BTW: I’m working on more permanent solution Events: Optimize enrolls to avoid timeouts by martastain · Pull Request #396 · ynput/ayon-backend · GitHub

1 Like

Thank you @martin.wacker.