Jump to content
Larry Ullman's Book Forums

Recommended Posts

I have an XML parsing script that's operating a bit quirky. It parses a daily xml feed containing jobs to be featured on my career board. The most urgent problem I need fixed is that old jobs are not dropping off. If a job in my db is not present in the new feed it needs to be deactivated. Here is my preliminary code I was hoping to get some outside thoughts on. -Thanks

 

$doc = new DOMDocument(); //create a new domdocument to sort through xml tree

$doc->load( '/path/daily_jobs.xml' ); //load the daily job feed

$jobAds = $doc->getElementsByTagName( "jobAd" ); //create a new instance for every incurrance of a jobAd

 

foreach( $jobAds as $jobAd ) { //loop through each job in feed

$externalJobAdIds=$jobAd->getElementsByTagName( "externalJobAdId" ); //capture externalJobAdId nodes and set them to a variable to be used as unique identifier

 

$newJobs = array(); //create array to hold id's of the new jobs

$externalJobAdIds = $newJobs[]; //push id's into array

$query = "SELECT externalJobAdId FROM jobs WHERE company_id = 8060"; //run query to capture job id's already present in db

$oldJobs = mysql_query($query); //store results

 

foreach($oldJobs as $job) {//loop through results

if(!in_array($newJobs)) { //check if old jobs are still present in today's feed

$query = "DELETE * FROM jobs WHERE externalJobAdId = '" . $job . "'"; //delete job if it is not in today's feed

mysql_query($query);

}

}

}

Link to comment
Share on other sites

I've updated the script. Instead of creating two arrays, I created only one for the new jobs contained in the daily feed then I used the mysql "NOT IN" function to compare the old jobs with the new ones and delete any jobs that were outdated.

 

It saved me about 9 lines of code and moved some of the workload into the database which is also a good thing.

 

Trouble is now I'm getting a "Cannot use [] for reading" error whenever I run the script. I don't know how else to populate my jobs array. Below is the new code.

 

Sorry if this isn't the most appropriate forum to place this in. Perhaps it would've been better placed in the Advanced PHP 5 forum.

 

 

 

<?php

 

$doc = new DOMDocument(); //create a new domdocument to sort through xml tree

$doc->load( '/path/daily_jobs.xml' ); //load the daily job feed

$jobAds = $doc->getElementsByTagName( "jobAd" ); //create a new instance for every incurrance of a jobAd

$newJobs = array(); //create array to hold id's of the new jobs

 

foreach( $jobAds as $jobAd ) { //loop through each job in feed

$externalJobAdIds=$jobAd->getElementsByTagName( "externalJobAdId" ); //capture externalJobAdId nodes and set them to a variable to be used as unique identifier

$externalJobAdIds = $newJobs[]; //push id's into array

}

 

$implodedJobs = implode(",",$newJobs);

$query = "DELETE FROM `jobs` WHERE `company_id`= 8060 AND `externalJobAdId` NOT IN($implodedJobs)";

?>

Link to comment
Share on other sites

You assignment operation is backwards. You can do either of the following:

 

$newJobs[$i] = $externalJobAdIds; // Requires a counter variable $i that increments appropriately.

array_push($newJobs,$externalJobAdIds); // array_push info: http://php.net/manual/en/function.array-push.php

  • Upvote 1
Link to comment
Share on other sites

Thank you for the very informative reply HartleySan. I've tried the array push, though when I test the contents of the array with print_r() I'm returned a string without any real content in the format of:

 

Array ( [0] => DOMNodeList Object ( ) [1] => DOMNodeList Object ( )

 

Though when I echo $externalJobAdId before it's pushed into the array, I get the proper id. Any thoughts?

 

Thanks again for your initial reply.

Link to comment
Share on other sites

I am familiar with the DOMDocument library, but I have never used it myself. However, my advice is to echo values after each step of the process (each operation), and see where your error is.

 

It could be something as simple as accidentally having spelled "jobAd" differently from the nodes in the XML file (remember, names are case sensitive) to incorrectly calling a method, etc., etc. Also, the path to the XML file you're loading could be wrong. Really, there are too many things at play to accurately guess.

 

Test something after each operator until you encounter something you didn't expect.

  • Upvote 1
Link to comment
Share on other sites

Thanks for your suggestions. I'll continue to tinker with the code. For now I'm manually deleting the entries and then rerunning the script so that only the new jobs are present in the db. Not at all very efficient since the company I hired to write the script left it full of other quirks that require additional manual db administration with every reload. But thank you again for the very educational suggestions. They'll be helpful in not just this project but others soon to come.

Link to comment
Share on other sites

 Share

×
×
  • Create New...