Asked  7 Months ago    Answers:  5   Viewed   55 times

In PDO, a connection can be made persistent using the PDO::ATTR_PERSISTENT attribute. According to the php manual -

Persistent connections are not closed at the end of the script, but are cached and re-used when another script requests a connection using the same credentials. The persistent connection cache allows you to avoid the overhead of establishing a new connection every time a script needs to talk to a database, resulting in a faster web application.

The manual also recommends not to use persistent connection while using PDO ODBC driver, because it may hamper the ODBC Connection Pooling process.

So apparently there seems to be no drawbacks of using persistent connection in PDO, except in the last case. However., I would like to know if there is any other disadvantages of using this mechanism, i.e., a situation where this mechanism results in performance degradation or something like that.

 Answers

73

Please be sure to read this answer below, which details ways to mitigate the problems outlined here.


The same drawbacks exist using PDO as with any other PHP database interface that does persistent connections: if your script terminates unexpectedly in the middle of database operations, the next request that gets the left over connection will pick up where the dead script left off. The connection is held open at the process manager level (Apache for mod_php, the current FastCGI process if you're using FastCGI, etc), not at the PHP level, and PHP doesn't tell the parent process to let the connection die when the script terminates abnormally.

If the dead script locked tables, those tables will remain locked until the connection dies or the next script that gets the connection unlocks the tables itself.

If the dead script was in the middle of a transaction, that can block a multitude of tables until the deadlock timer kicks in, and even then, the deadlock timer can kill the newer request instead of the older request that's causing the problem.

If the dead script was in the middle of a transaction, the next script that gets that connection also gets the transaction state. It's very possible (depending on your application design) that the next script might not actually ever try to commit the existing transaction, or will commit when it should not have, or roll back when it should not have.

This is only the tip of the iceberg. It can all be mitigated to an extent by always trying to clean up after a dirty connection on every single script request, but that can be a pain depending on the database. Unless you have identified creating database connections as the one thing that is a bottleneck in your script (this means you've done code profiling using xdebug and/or xhprof), you should not consider persistent connections as a solution to anything.

Further, most modern databases (including PostgreSQL) have their own preferred ways of performing connection pooling that don't have the immediate drawbacks that plain vanilla PHP-based persistent connections do.


To clarify a point, we use persistent connections at my workplace, but not by choice. We were encountering weird connection behavior, where the initial connection from our app server to our database server was taking exactly three seconds, when it should have taken a fraction of a fraction of a second. We think it's a kernel bug. We gave up trying to troubleshoot it because it happened randomly and could not be reproduced on demand, and our outsourced IT didn't have the concrete ability to track it down.

Regardless, when the folks in the warehouse are processing a few hundred incoming parts, and each part is taking three and a half seconds instead of a half second, we had to take action before they kidnapped us all and made us help them. So, we flipped a few bits on in our home-grown ERP/CRM/CMS monstrosity and experienced all of the horrors of persistent connections first-hand. It took us weeks to track down all the subtle little problems and bizarre behavior that happened seemingly at random. It turned out that those once-a-week fatal errors that our users diligently squeezed out of our app were leaving locked tables, abandoned transactions and other unfortunate wonky states.

This sob-story has a point: It broke things that we never expected to break, all in the name of performance. The tradeoff wasn't worth it, and we're eagerly awaiting the day we can switch back to normal connections without a riot from our users.

Wednesday, March 31, 2021
 
Ultimater
answered 7 Months ago
41

My simple-minded (ISAM, no transactions) C-language app runs for eight hours a day, updating multiple tables in one database over one single MySQL connection that stays open the whole time. It works just fine. Anytime there's any kind of MySQL error (not only server gone away), the code just calls mysql_real_connect() again and it picks right up without any trouble. Reconnection is one of the places where, in my opinion, MySQL functions flawlessly.

But there's plenty of controversy and discussion about the goodness/badness of persistent connections. You can find some of it here:

http://www.google.com/webhp?hl=&sourceid=navclient-ff&rlz=1B3GGLL_enUS384US384&ie=UTF-8#rlz=1B3GGLL_enUS384US384&hl=en&source=hp&q=mysql+persistent+connection&aq=0&aqi=g4g-m5&aql=&oq=mysql+persistent+conn&gs_rfai=Ch2c6iCchTO3zG4i6MZ-i7JAOAAAAqgQFT9BAKCs&fp=ff274912d96214e6

-- HTH

Wednesday, March 31, 2021
 
Uours
answered 7 Months ago
42

Had already downloaded the driver and it didn't work. Found a new site for the driver and this one works.

https://github.com/Microsoft/msphpsql/releases

php.ini line added:

extension=php_pdo_sqlsrv_7_nts.dll
Wednesday, March 31, 2021
 
njai
answered 7 Months ago
73

showdev's comment is correct that the PDO DSN does not allow host:port syntax.

If your CMS is defining DB_HOST outside of your control, you can't use that constant directly. But you can pull information out of it.

$host_port = preg_replace('/:(d+)/', ';port=${1}', DB_HOST);
$db = new PDO("mysql:host={$host_port};dbname=".DB_NAME.";charset=utf8", 
    DB_USER, DB_PW, array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
Friday, May 28, 2021
 
Chvanikoff
answered 5 Months ago
89

Here are a variety of answers:

  1. Abundance of options First, the concern is valid, but your list of choices is a little more narrow than it should be. HDF5/netCDF4 is an excellent option, and work well with Python, Matlab, and many other systems. HDF5 is superior to Python's pickle storage in many ways - check out PyTables and you'll very likely see good speedups. Matlab used to have (and may still have) some issues with how large cell (or maybe struct) arrays are stored in HDF5. It's not that it can't do it, but that it was god-awful slow. That's Matlab's problem, not HDF5's. While these are great choices, you may also consider whether HDF5 is adequate: consider if you have some very large files and could benefit from a proprietary encoding, either for speed of access or compression. It's not too hard to do raw binary storage in any language and you could easily design something like the file storage of bigmemory (i.e. speed of access). In fact, you could even use bigmemory files in other languages - it's really a very simple format. HDF5 is certainly a good starting point, but there is no one universal solution for data storage and access, especially when one gets to very large data sets. (For smaller data sets, you might also take a look at Protocol Buffers or other serialization formats; Dirk did RProtoBuf for accessing these in R.) For compression, see the next suggestion.

  2. Size As Dirk mentioned, the file formats can be described as application neutral and application dependent. Another axis is domain-independent (or domain-ignorant) or domain-dependent (domain-smart ;-)) storage. If you have some knowledge of how your data will arise, especially any information that can be used in compression, you may be able to build a better format than anything that standard compressors may be able to do. This takes a bit of work. Alternative compressors than gzip and bzip also allow you to analyze large volumes of data and develop appropriate compression "dictionaries" so that you can get much better compression that you would with .Rdat files. For many kinds of datasets, storing the delta between different rows in a table is a better option - it can lead to much greater compressibility (e.g. lots of 0s may appear), but only you know whether that will work for your data.

  3. Speed and access .Rdat does not support random access. It does not have built-in support for parallel I/O (though you can serialize to a parallel I/O storage, if you wish). There are many things one could do here to improve things, but it's a thousand cuts to glue stuff on to .Rdat over and over again, rather than just switch to a different storage mechanism and blow the speed and access issues away. (This isn't just an advantage of HDF5: I have frequently used multicore functions to parallelize other I/O methods, such as bigmemory.)

  4. Update capabilities R does not have a very nice way to add objects to a .Rdat file. It does not, to my knowledge, offer any "Viewers" to allow users to visually inspect or search through a collection of .Rdat files. It does not, to my knowledge, offer any built-in versioning record-keeping of objects in the file. (I do this via a separate object in the file, which records the versions of scripts that generated the objects, but I will outsource that to SQLite in a future iteration.) HDF5 has all of these. (Also, the random access affects updating of the data - .Rdat files, you have to save the whole object.)

  5. Communal support Although I've advocated your own format, that is for extreme data sizes. Having libraries built for many languages is very helpful in reducing the friction of exchanging data. For most simple datasets (and simple still means "fairly complex" in most cases) or moderate to fairly large datasets, HDF5 is a good format. There's ways to beat it on specialized systems, certainly. Still, it is a nice standard and will mean less organizational effort will be spent supporting either a proprietary or application-specific format. I have seen organizations stick to a format for many years past the use of the application that generated the data, just because so much code was written to load and save in that application's format and GBs or TBs of data were already stored in its format (this could be you & R someday, but this arose from a different statistical suite, one that begins with the letter "S" and ends with the letter "S" ;-)). That's a very serious friction for future work. If you use a widespread standard format, you can then port between it and other widespread standards with much greater ease: it's very likely someone else has decided to tackle the same problem, too. Give it a try - if you do the converter now, but don't actually convert it for use, at least you have created a tool that others could pick up and use if there comes a time when it's necessary to move to another data format.

  6. Memory With .Rdat files, you have to load or attach it in order to access objects. Most of the time, people load the file. Well, if the file is very big, there goes a lot of RAM. So, either one is a bit smarter about using attach or separates objects into multiple files. This is quite a nuisance for accessing small parts of an object. To that end, I use memory mapping. HDF5 allows for random access to parts of a file, so you need not load all of your data just to access a small part. It's just part of the way things work. So, even within R, there are better options than just .Rdat files.

  7. Scripts for conversion As for your question about writing a script - yes, you can write a script that loads objects and saves them into HDF5. However, it is not necessarily wise to do this on a huge set of heterogenous files, unless you have a good understanding of what's going to be created. I couldn't begin to design this for my own datasets: there are too many one-off objects in there, and creating a massive HDF5 file library would be ridiculous. It's better to think of it like starting a database: what will you want to store, how will you store it, and how will it be represented and accessed?

Once you get your data conversion plan in place, you can then use tools like Hadoop or even basic multicore functionality to unleash your conversion program and get this done as quickly as possible.

In short, even if you stay in R, you are well advised to look at other possible storage formats, especially for large, growing, data sets. If you have to share data with others, or at least provide read or write access, then other formats are very much advised. There's no reason to spend your time maintaining readers/writers for other languages - it's just data not code. :) Focus your code on how to manipulate data in sensible ways, rather than spend time working on storage - other people have done a very good job on that already.

Monday, August 9, 2021
 
learningpython
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :