Asked  7 Months ago    Answers:  5   Viewed   31 times

I work on a somewhat large web application, and the backend is mostly in PHP. There are several places in the code where I need to complete some task, but I don't want to make the user wait for the result. For example, when creating a new account, I need to send them a welcome email. But when they hit the 'Finish Registration' button, I don't want to make them wait until the email is actually sent, I just want to start the process, and return a message to the user right away.

Up until now, in some places I've been using what feels like a hack with exec(). Basically doing things like:

exec("doTask.php $arg1 $arg2 $arg3 >/dev/null 2>&1 &");

Which appears to work, but I'm wondering if there's a better way. I'm considering writing a system which queues up tasks in a MySQL table, and a separate long-running PHP script that queries that table once a second, and executes any new tasks it finds. This would also have the advantage of letting me split the tasks among several worker machines in the future if I needed to.

Am I re-inventing the wheel? Is there a better solution than the exec() hack or the MySQL queue?

 Answers

70

I've used the queuing approach, and it works well as you can defer that processing until your server load is idle, letting you manage your load quite effectively if you can partition off "tasks which aren't urgent" easily.

Rolling your own isn't too tricky, here's a few other options to check out:

  • GearMan - this answer was written in 2009, and since then GearMan looks a popular option, see comments below.
  • ActiveMQ if you want a full blown open source message queue.
  • ZeroMQ - this is a pretty cool socket library which makes it easy to write distributed code without having to worry too much about the socket programming itself. You could use it for message queuing on a single host - you would simply have your webapp push something to a queue that a continuously running console app would consume at the next suitable opportunity
  • beanstalkd - only found this one while writing this answer, but looks interesting
  • dropr is a PHP based message queue project, but hasn't been actively maintained since Sep 2010
  • php-enqueue is a recently (2017) maintained wrapper around a variety of queue systems
  • Finally, a blog post about using memcached for message queuing

Another, perhaps simpler, approach is to use ignore_user_abort - once you've sent the page to the user, you can do your final processing without fear of premature termination, though this does have the effect of appearing to prolong the page load from the user perspective.

Wednesday, March 31, 2021
 
PeterTheLobster
answered 7 Months ago
74

This:

/var/www/src/Tests/../../vendor/ezsystems/ezpublish-kernel/config.php

is translated to:

/var/www/vendor/ezsystems/ezpublish-kernel/config.php

Because of ../..

You are not looking there.

Saturday, May 29, 2021
 
PLPeeters
answered 5 Months ago
88

I've had excellent results with BeanstalkD and a back-end written in PHP to retrieve jobs and then act on them. I wrapped the actual job-running in a bash-script to keep running if even if it exited (unless I do a 'exit(UNIQNUM);', when the script checks it and will actually exit). In that way, the restarted PHP script clears down any memory that may have been used, and can start afresh every 25/50/100 jobs it runs.

A couple of the advantages of using it is that you can set priorities and delays into a BeanstalkD job - "run this at a lower priority, but don't start for 10 seconds". I've also queued a number of jobs up at the some time (run this now, in 5 seconds and again after 30 secs).

With the appropriate network configuration (and running it on an accessible IP address to the rest of your network), you can also run a beanstalkd deamon on one server, and have it polled from a number of other machines, so if there are a large number of tasks being generated, the work can be split off between servers. If a particular set of tasks needs to be run on a particular machine, I've created a 'tube' which is that machine's hostname, which should be unique within our cluster, if not globally (useful for file uploads). I found it worked perfectly for image resizing, often returning the finished smaller images to the file system before the webpage itself that would refer to it would refer to the URL it would be arriving at.

I'm actually about to start writing a series of articles on this very subject for my blog (including some techniques for code that I've already pushed several million live requests through) - My URL is linked from my user profile here, on Stackoverflow.

(I've written a series of articles on the subject of Beanstalkd and queuing of jobs)

Friday, July 9, 2021
 
Tak
answered 4 Months ago
Tak
81

Well, if you're on Linux, you can use pcntl_fork to fork children off. The "master" then watches the children. Each child completes its task and then exists normally.

Personally, in my implementations I've never needed a message queue. I simply used an array in the "master" with locks. When a child got a job, it would write a lock file with the job id number. The master would then wait until that child exited. If the lock file still exists after the child exited, then I know the task wasn't completed, and re-launch a child with the same job (after removing the lock file). Depending on your situation, you could implement the queue in a simple database table. Insert jobs in the table, and check the table in the master every 30 or 60 seconds for new jobs. Then only delete them from the table once the child is finished (and the child removed the lock file). This would have issues if you had more than one "master" running at a time, but you could implement a global "master pid file" to detect and prevent multiple instances...

And I would not suggest forking with FastCGI. It can result in some very obscure problems since the environment is meant to persist. Instead, use CGI if you must have it web interface, but ideally use a CLI app (a deamon). To interface with the master from other processes, you can either use sockets for TCP communication, or create a FIFO file for communication.

As for detecting hung workers, you could implement a "heart-beat" system, where the child issues a SIG_USR1 to the master process every so many seconds. Then if you haven't heard from the child in two or three times that time, it may be hung. But the thing is since PHP isn't multi-threaded, you can't tell if a child is hung or if it's just waiting on a blocking resource (like a database call)... As for implementing the "heart-beat", you could use a tick function to automate the heart-beat (but keep in mind, blocking calls still won't execute)...

Sunday, August 1, 2021
 
eyei
answered 3 Months ago
37

.NET already provides a mechanism to report progress with the IProgress< T> and the Progress< T> implementation.

The IProgress interface allows clients to publish messages with the Report(T) class without having to worry about threading. The implementation ensures that the messages are processed in the appropriate thread, eg the UI thread. By using the simple IProgress< T> interface the background methods are decoupled from whoever processes the messages.

You can find more information in the Async in 4.5: Enabling Progress and Cancellation in Async APIs article. The cancellation and progress APIs aren't specific to the TPL. They can be used to simplify cancellation and reporting even for raw threads.

Progress< T> processes messages on the thread on which it was created. This can be done either by passing a processing delegate when the class is instantiated, or by subscribing to an event. Copying from the article:

private async void Start_Button_Click(object sender, RoutedEventArgs e)
{
    //construct Progress<T>, passing ReportProgress as the Action<T> 
    var progressIndicator = new Progress<int>(ReportProgress);
    //call async method
    int uploads=await UploadPicturesAsync(GenerateTestImages(), progressIndicator);
}

where ReportProgress is a method that accepts a parameter of int. It could also accept a complex class that reported work done, messages etc.

The asynchronous method only has to use IProgress.Report, eg:

async Task<int> UploadPicturesAsync(List<Image> imageList, IProgress<int> progress)
{
        int totalCount = imageList.Count;
        int processCount = await Task.Run<int>(() =>
        {
            int tempCount = 0;
            foreach (var image in imageList)
            {
                //await the processing and uploading logic here
                int processed = await UploadAndProcessAsync(image);
                if (progress != null)
                {
                    progress.Report((tempCount * 100 / totalCount));
                }
                tempCount++;
            }

            return tempCount;
        });
        return processCount;
}

This decouples the background method from whoever receives and processes the progress messages.

Sunday, August 29, 2021
 
ANIS BOULILA
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :