Jump to content
Larry Ullman's Book Forums

Xml Parsing Error: Not Well-Formed


Recommended Posts

I am trying to use a php proxy script to display book titles from isbn db web service. Following ajax + javascript calls a php script which retrieves data from isbn db. PHP script gives this error:

 

 

XML Parsing Error: not well-formed

Location: http://localhost/readinglog/isbn3.php?searchterm=bible&submit=Search

Line Number 1, Column 11:biblebible<?xml version="1.0" encoding="UTF-8"?>

 

JavaScript code:

 

 

var titlesArray = new Array();

function handleAjaxResponse(e) {

'use strict';

if(typeof e == 'undefined') e = window.event;

var ajax = e.target || e.srcElement;

//console.log(ajax);

if(ajax.readyState == 4 ) {

if ((ajax.status >=200 && ajax.status < 300) || ajax.status == 304) {

console.log(ajax.responseXML);

if(ajax.responseXML) {

console.log(ajax.responseXML);

var allBooks = ajax.responseXML.getElementsByTagName('BookData');

for(var i=0, count = allBooks.length; i<count; i++) {

titlesArray = allBooks.getElementsByTagName('Title')[0].firstChild;

console.log(titlesArray);

}

}

}

 

}

}

 

 

function sendValue() {

var searchTerm = document.getElementById('searchTerm');

console.log(searchTerm.value);

var ajax = getXMLHttpRequestObject();

ajax.onreadystatechange = handleAjaxResponse;

ajax.open('GET', 'isbn3.php?searchterm=' + encodeURIComponent(searchTerm.value), true);

//ajax.open('GET', './isbn3.php', true);

ajax.send(null);

// return false;

}

window.onload = function() {

 

'use strict';

document.getElementById('searchTerm').onkeyup = sendValue;

 

}

 

 

code for isbn3.php:

 

 

if(isset($_GET['searchterm']) && is_string($_GET['searchterm']))

{

//Hold the serachterm into a variable, and typecast it.:

 

$isbnQuery =(string) $_GET['searchterm'];

 

 

if($isbnQuery == NULL)

{

$error[] = "You forgot to enter ISBN. Please go back and correct the error.";

}

 

if(empty($error))

{

 

 

$isbnData ="http://isbndb.com/api/books.xml?access_key=keyno&results=details,texts&index1=title&value1=$isbnQuery";

 

//$xmlData = @simplexml_load_file($isbnData); //or die("Invalid ISBN ERROR");

$xmlDoc = file_get_contents($isbnData);

//echo '<?xml version="1.0" encoding="utf-8" standalone="yes">';

header("Content-Type: application/xml");

echo $xmlDoc; //Read the entire xml document in string.

 

?>

Link to comment
Share on other sites

XML is from isbn db : http://isbndb.com/data-intro.html

 

However, I simplified php script as follows:

 

 

<?php

 

 

$stitle = $_GET['searchterm'];

//echo $stitle;

 

$url = "http://isbndb.com/api/books.xml?access_key=mykey&results=details,texts&index1=title&value1=$stitle";

 

$xml = file_get_contents($url);

header("Content-Type: applicaton/xml");

echo $xml;

?>

 

It shows book titles, but the list comes up quite long, so I don't get the exact title.. Still need to work to get to the root.

Thanks again...

Link to comment
Share on other sites

Sorry for the delayed response. I've been busy recently. I made the simple example below to illustrate how to get data from an XML file.

 

I grabbed the sample XML from the following URL and placed it in a file called book.xml as follows:

http://isbndb.com/data-intro.html

 

book.xml

<?xml version="1.0" encoding="UTF-8"?>
<ISBNdb server_time="2005-07-29T03:02:22">
<BookList total_results="1">
 <BookData book_id="paul_laurence_dunbar" isbn="0766013502">
  <Title>Paul Laurence Dunbar</Title>
  <TitleLong>Paul Laurence Dunbar: portrait of a poet</TitleLong>
  <AuthorsText>Catherine Reef</AuthorsText>
  <PublisherText publisher_id="enslow_publishers">
Berkeley Heights, NJ: Enslow Publishers, c2000.
  </PublisherText>
  <Summary>
A biography of the poet who faced racism and devoted himself
to depicting the black experience in America.
  </Summary>
  <Notes>
"Works by Paul Laurence Dunbar": p. 113-114.
Includes bibliographical references (p. 124) and index.
  </Notes>
  <UrlsText></UrlsText>
  <AwardsText></AwardsText>
  <Prices>
<Price store_id="alibris" is_in_stock="1" is_new="0"
	   check_time="2005-07-29T01:18:18" price="14.92"/>
<Price store_id="amazon" is_in_stock="1" is_new="1"
	   check_time="2005-07-29T01:18:20" price="26.60" />
  </Prices>
 </BookData>
</BookList>
</ISBNdb>

 

I then ran the following sample script from XAMPP:

<!DOCTYPE html>

<html lang="en">

 <head>

   <meta charset="UTF-8">

   <title>responseXML test</title>

 </head>

 <body>

   <script>

     var ajax = new XMLHttpRequest();

     ajax.open('get', 'book.xml', true);

     ajax.onreadystatechange = handleResponse;

     ajax.send(null);

     function handleResponse() {

       if ((ajax.readyState === 4) && (ajax.status === 200)) {

         var xmlDoc = ajax.responseXML;

         var title = xmlDoc.getElementsByTagName('Title');

         document.body.innerHTML = title[0].firstChild.nodeValue;

       }

     }

   </script>

 </body>

</html>

 

The main thing I noticed was that you didn't use the nodeValue property. Perhaps that is the issue.

While the above script is simple, I think it's a good starting point for hopefully getting your more complex script working.

Good luck!

  • Upvote 2
Link to comment
Share on other sites

Thanks a lot for taking time to write code for this. I want to display autocomplete list of book titles. The problem is, isbn db provides only 10 books list in one response. So, while a user starts typing a character, my script retrieves first 10 titles with that first character any where in the title. So, I never get the title that I want. I hope to get some work around with this problem...

Link to comment
Share on other sites

Well, since you don't own the DB, that's gonna pretty hard to do, not to mention not the best thing to do unless the site openly agrees to letting you mine their data.

 

I think the best thing you can do is that each time a user types a character, perform a search using the ISBN's DB search feature, then use Ajax, etc. to get the HTML for the search results page, mine that data for the first ten results, and then use those titles to search the individual book pages for more info. That's about all you can do short of having the entire DB yourself, which is unlikely.

  • Upvote 1
Link to comment
Share on other sites

HartleySan, thanks again for quite useful input.

However, I don't get details of how to use results page and mine that data. Another thing, I noticed is isbn's api returns results of required search term. But I don't think there is any way we can use it's search feature.

Link to comment
Share on other sites

It's great that there's an API available. I wished you had mentioned that from the start.

An API is great for two reasons:

 

1) It'll greatly simplify what you're trying to accomplish.

2) It means that the ISBN site is completely open and fine with you searching on their site.

 

With that said, I think I figured out the problem. Please take a look at the following page:

http://isbndb.com/docs/api/30-keys.html

 

It looks like the ISBN people regulate access to the DB, so while it's free, you need to register with them and enter a valid access key in order to use their service. Once you have an access key, if you read the rest of the API documentation, you should be okay.

 

Please let me know if it works out.

Thanks.

  • Upvote 1
Link to comment
Share on other sites

I sure have that access key as I registered with isbn site, and using that api for few months now. I can pull and print required data with a book's isbn. (I have implemented this feature on my website: www.myreadinglog.net)

My problem is how can I display data when I pull it for a book's title. Sometime, my query returns more than 5000 titles but I don't know how to display it and get the exact title I want. If I search for a title with "the" word, the query will return many titles starting with numbers... and so on....

I'm missing something somewhere.. HOwever, thanks again..

 

one more question:

How to echo an xml document through php?

 

 

$xml = file_get_contents($url);

header("Content-Type: applicaton/xml");

echo $xml; //This statement prompts to download xml document.

 

 

$xmlData = @simplexml_load_file($url);

header("Content-Type: application/xml");

echo $xmlData; //These statements show syntex error in xml document.

Link to comment
Share on other sites

Following php script brings out data as required, but gives syntex error like XML Parsing Error: not well-formed and points to unexpected xml characters like '&'. (XML comes from isbn db api). How to deal with them? htmlspecialchars works with string, but simple_xml_load_file returns object.

 

 

header('Content-Type: text/xml');

echo '<?xml version="1.0" encoding="utf-8" standalone="yes" ?>

<item>';

 

$stitle = $_GET['searchterm'];

 

 

$url = "http://isbndb.com/api/books.xml?access_key=mykey&results=details,texts&index1=title&value1=$stitle";

 

$xmlData = @simplexml_load_file($url);

 

print_r($xmlData);

Link to comment
Share on other sites

Most likely (although I don't know for sure), the simplexml_load_file function can only be used with local content. For cross-domain requests, you will have to use Ajax and (maybe) cURL.

 

Given that there is an API for the ISBNDB though, that all seems unnecessary. When you perform a search, regardless of how many results there are, you should return that data using the responseXML property, and then scan through the entire document for all the title tags (or whatever tags you want). You can use the responseText property as follows to print out the XML document structure to the screen:

 

document.body.innerHTML = ajax.responseText.replace(/</g, '<').replace(/\n/g, '<br>');

 

Once you have all the XML data in the responseXML property, you can easily display all the book titles and search for a specific title with a for loop.

 

As for the massive amount of search results, that sounds like a limitation with the search function on the ISBN site. You can create your own filter to cut out stop words like "the" from the search string, but that's up to you.

 

I recommend playing around with the normal search feature on the site to get an idea of how different search strings generate various search results to decide how to filter your search strings.

 

Also, I looked briefly at the API documentation, and it stated that no matter how many search results there are, only 10 are returned at a time, so maybe just always displaying the top 10 results will be sufficient, but again, that's up to you.

 

Without an access key, I can't really help you more than that. Sorry.

Link to comment
Share on other sites

Me too think that this is something simple, as there is an api. As there is a cross-domain restriction with AJAX, it is required to use a php proxy script to load an xml file provided through isbn db. That xml file then can be used to display data through javascript. So, I'm trying to load the xml file through a php script.

As an example, I tried to search for "Modern JavaScript" on isbn db website, which returns 3 results.

Pls. see below two different approaches I tried with my php script. The script with simple_xml_load_file provides only 3 results (as required). But that script shows xml syntex error.

Another script with get_file_contents returns more than 50000 records in the given xml file. And because only top 10 results are displayed out of them, I never get "Modern JavaScript Develop and Design"...

 

Script 1:

 

 

header('Content-Type: text/xml');

echo '<?xml version="1.0" encoding="utf-8" standalone="yes" ?>

<item>';

$stitle = $_GET['searchterm'];

$url = "http://isbndb.com/api/books.xml?

 

access_key=p;results=details,texts&index1=title&value1=$stitle";

$xmlData = @simplexml_load_file($url);

print_r($xmlData);

echo '</item>';

 

Script 2:

$url = "http://isbndb.com/api/books.xml?

 

access_key=amp;results=details,texts&index1=title&value1=$stitle";

 

$xml = file_get_contents($url);

$xml = simplexml_load_string($xml);

header("Content-Type: applicaton/xml");

print_r($xml);

 

 

So, I want to use script 1, but don't know how to parse/encode coming xml file to convert special

 

characters into xml entities.

Link to comment
Share on other sites

I got it now. After searching through a lot about php and XML, I found out that I had made a simple mistake while making url request. Now I get the result as I want it. By reading all these, I think I got to learn a bit about SimpleXML.

Thanks a lot, HartleySan.

I think discussing things over here gets me different ways to think about a problem.

Link to comment
Share on other sites

 Share

×
×
  • Create New...