Jump to content
Larry Ullman's Book Forums

Large Data Set Export Best Practice


Recommended Posts

Larry, I am loving your book.  I learned PHP on my own a couple of years ago and have been quite productive, but never felt I had a fully grasp of OOP.  Your book quickly helped me understand the core concepts.  Thanks!

 

So here is my question.  I have an application where a user is allowed to configure complex queries using a point and click configurator.  This eliminates the need for the user to understand SQL and allows us to keep the underlying schema confidential.

 

The user input is used in the backend to generate the actual MySQL query.  This can involve multiple table joins.  

 

The output can be: (1) downloaded to the browser as a csv file, (2) sent to a sFTP or CIFS mount as a csv file or (3) sent to a remote database.

 

Before the data can be exported, various manipulations are required to be performed on the data.

 

Some of these queries can generate very large data sets which would preclude holding them in memory.    

 

So my question is this - what is considered PHP best practice for this type of use case?  Should the large data set be stored as a file using a single query, after which it can be manipulated and then exported in the appropriate format?  Or should a temp database table be used?

 

Using a slice technique (i.e. where the query is run in batches) seems problematic to me as it seems expensive versus a single query. 

 

Thanks for your insights on this.

Link to comment
Share on other sites

  • 2 weeks later...

Thanks so much for the nice words! I really appreciate it. And apologies for the delayed reply. 

 

This sounds like an interesting application and certainly one that could have issues if not implemented well. I think the most important thing is you probably want to start the process and then email the user when their file is ready. Expecting the browser to hang while your server does everything it needs to do is probably a recipe for trouble. 

 

Second, in terms of generating the file, I thiiiiink the right answer is to populate a temp table with the data and then do a simple select on that and write it to a file. But there are a lot of factors that go into the performance of something like this (a complex database query + a file write), so the actual best way to do this would be to benchmark the performance of a couple of different methods--benchmarking even specific components of the process--and then choose the components/method that works best. 

Link to comment
Share on other sites

 Share

×
×
  • Create New...