Generic Queue mechanism for heavy load operations

Izenda implemented a scalable solution to support the Task queue job where alerts and subscriptions which start at the same time will be performed sequentially to handle the load of the server.

The Queue mechanism provides the possibility for execution of heavy load operations:

  • Sequentially, one by one, occupying strictly predetermined number of threads to avoid Izenda instance (or service node) overload
  • Allowing for both embedded (native) workers, and easily implementable external workers, to distribute load related to jobs execution across dynamically spun-up services/nodes
  • In a failproof manner:
    • Failed/crashed workers get re-instantiated;
    • Halfway terminated jobs will be executed again;
    • Jobs exceeded execution timeout, get “timedOut” flag to avoid repeat tries to execute;

The implemented Queue mechanism consists of two storages (tables) in Repository DB.

First one is Queued Tasks storage, which both keeps, and orders (prioritizes) Tasks for sequential execution. Currently it behaves as a simple FIFO buffer, so Tasks are dispatched for processing in the order, in which they were inserted into Queue.

Second is Registered Workers, which keeps IDs of “known” Workers, and tracks their health by regular “alive” pings issued by Workers. Worker can be any service/node/routine/webapp, which is able to process some of Izenda jobs, and implements simplest interactions with Queue Logic module in Izenda. Each Worker can request and serve either one kind of Task/Job, or a few of them, or even all of them - whatever is its implementation.

Also, there are internal (embedded) Workers in Izenda backend, which can process all kinds of Tasks/Jobs, in case when customer doesn’t want to deploy external services/Workers.

The Queue Logic module in the Izenda backend exposes following methods (endpoints) for Workers:

  • Enqueue Task
    • It accepts (enum) JobKind, (Guid) JobEntityId and (string) JobData values, representing / describing data related to job to be performed.
    • Note that sometimes Workers will need be able to pull some additional data.
    • For example, pull the Subscription object from WebAPI / Repository by its Id provided in JobEntityId.
    • It returns the (Guid) Id of the enqueued Task, to allow for its progress tracking.
  • Register Worker
    • It accepts just an empty request, and returns the (Guid) Id, which will identify given worker
  • Alive Ping
    • It accepts the request with (Guid) Id identifying given Worker, and prolongs its “health / validity” for another period of time (specified in Queue mechanism configuration).
    • It returns (bool) True if Worker validity was successfully prolonged, and (bool) False if Worker was somehow considered expired / corrupted, and should recycle.
  • Dequeue Task
    • It accepts the request with Worker-identifying (Guid) Id, checks for Worker “health / validity” in RegisteredWorkers table, and if it’s fine, then picks “oldest” Task from QueueTasks table and returns it to given Worker in response for processing
  • Set Job Execution Outcome
  • It accepts the (Guid) Id of Task, the (enum) Outcome (which can be Succeeded, Failed, or TimedOut), and the (Guid) Id of the Worker, (because there is possibility that some Worker implementations can dequeue and process a few Tasks simultaneously).
  • Then it checks for given Worker validity, and if it’s fine, then updates Task, effectively removing it from the Queue.