Stuart Posted June 6, 2011 Share Posted June 6, 2011 I've just started a new project which involves collating 104 years of magazine articles currently in PDF format and making them searchable. I've extracted the content from a DVD they produce and uploaded it to my development server. There's a large number of documents (approx 44000) which need to be organised including PDFs, images, XMPs and THMs. At the moment they are arranged as they were on the DVD in folders for the volume and sub-folders for each issue within that volume. My question is should I keep this structure or recurse through and pull all the PDF's into one directory or at least into one directory per volume? Are there any pros or cons of having them organised in either way? The file names are all unique so moving into a single directory would not overwrite any files. PS. I'm using the Zend_Lucene article you wrote to index all the PDF's too Larry so thanks for that!! Thanks Link to comment Share on other sites More sharing options...
Larry Posted June 7, 2011 Share Posted June 7, 2011 Hey Stuart, I'm glad that Zend_Lucene article was useful for you. Good luck with your project. I would be inclined to keep the files organized as they are, at least because it's convenient and easiest. There can be performance issues putting too many files and directories in one folder, but it depends upon the OS in use. Link to comment Share on other sites More sharing options...
Stuart Posted June 7, 2011 Author Share Posted June 7, 2011 OK thanks Larry I think I'll leave them as they are for now then - did wonder if they'd be performance issues. Right now the issue appears to be extracting the text from the PDF so a good chance I'll be creating a linux install post relatively soon! Link to comment Share on other sites More sharing options...
Recommended Posts