I am trying to scrape a remote website and edit parts of the results before updating a couple of tables in the database and subsequently echo()'ing the final document.
Here's a redacted snippet of the code in question for reference:
<?php
require_once 'backend/connector.php';
require_once 'table_access/simplehtmldom_1_5/simple_html_dom.php';
require_once 'pronunciation1.php';
// retrieve lookup term
if(isset($_POST["lookup_term"])){ $term = trim($_POST["lookup_term"]); }
else { $term = "hombre"; }
$html = file_get_html("http://www.somesite.com/translate/" . rawurlencode($term));
$coll_temp = $html->find('div[id=translate-en]');
$announce = $coll_temp[0]->find('.announcement');
$quickdef = $coll_temp[0]->find('.quickdef');
$meaning = $announce[0] . $quickdef[0];
$html->clear(); // release scraper variable to prevent memory leak issues
unset($html); // release scraper variable to prevent memory leak issues
$meaning = '<?xml version="1.0" encoding="ISO-8859-15"?>' . $meaning;
// process the newly-created DOM
$dom = new DOMDocument;
$dom->loadHTML($meaning);
// various DOM-manipulation code snippets
// extract the quick definition section
foreach ($dom->find('div[class=quickdef]') as $qdd) {
$qdh1 = $qdd->find('.source')[0]->find('h1.source-text');
$qdterm = $qdh1[0]->plaintext;
$qdlang = $qdh1[0]->getAttribute('source-lang');
add2qd($qdterm, $qdd, $qdlang);
unset($qdterm);
unset($qdlang);
unset($qdh1);
}
$finalmeaning = $dom->saveHTML(); // store processed DOM in $finalmeaning
push2db($term, $finalmeaning); // add processed DOM to database
echo $finalmeaning; // output processed DOM
// release variables
unset($dom);
unset($html);
unset($finalmeaning);
function add2qd($lookupterm, $finalqd, $lang){
$connect = dbconn(PROJHOST, CONTEXTDB, PEPPYUSR, PEPPYPWD);
$sql = 'INSERT IGNORE INTO tblquickdef (word, quickdef, lang) VALUES (:word, :quickdef, :lang)';
$query = $connect->prepare($sql);
$query->bindParam(':word', $lookupterm);
$query->bindParam(':quickdef', $finalqd);
$query->bindParam(':lang', $lang);
$query->execute();
$connect = null;
}
function push2db($lookupword, $finalmean) {
$connect = dbconn(PROJHOST, DICTDB, PEPPYUSR, PEPPYPWD);
$sql = 'INSERT IGNORE INTO tbldict (word, mean) VALUES (:word, :mean)';
$query = $connect->prepare($sql);
$query->bindParam(':word', $lookupword);
$query->bindParam(':mean', $finalmean);
$query->execute();
$connect = null;
}
?>
The code works fine except for the for loop under the // extract the quick definition section. The function being called from inside this loop is add2qd() which accepts 3 string values as input.
Every time this loop runs, PHP throws a fatal error because it thinks find() is undefined. I know find is a legitimate function in the PHP Simple HTML DOM Parser library because I have used it multiple times in the same code without any problem (in the //retrieve lookup term section). What am I doing wrong?
But your are not using the PHP Simple HTML DOM - only standard PHP DOMDocument, which does not have the method find.
$dom = new DOMDocument;
$dom->loadHTML($meaning);
foreach ($dom->find('div[class=quickdef]') as $qdd) {
http://php.net/manual/en/class.domdocument.php
Related
When I try to open a url like that :
http://api.anghami.com/rest/v1/GETsearch.view?sid=11754134061397734622103190992&query=Can't Remember to Forget You Shakira&searchtype=SONG&ook&songCount=1
containing a quote with the browser everything works fine and the output is good as an xml
But when I try to call it from a php file:
$url = "http:/api.anghami.com/rest/v1/GETsearch.view?sid=11754134061397734622103190992&query=Can't Remember to Forget You Shakira&searchtype=SONG&ook&songCount=1"
//using DOMDocument for parsing.
$data = new DOMDocument();
// loading the xml from Anghami API.
if($data->load("$url")){// Getting the Tag song.
foreach ($data->getElementsByTagName('song') as $searchNode)
{
$count++;
$n++;
//Getting the information of Anghami Song from the XML file.
$valueID = $searchNode->getAttribute('id');
$titleAnghami = $searchNode->getAttribute('title');
$album = $searchNode->getAttribute('album');
$albumID = $searchNode->getAttribute('albumID');
$artistAnghami = $searchNode->getAttribute('artist');
$track = $searchNode->getAttribute('track');
$year = $searchNode->getAttribute('year');
$coverArt = $searchNode->getAttribute('coverArt');
$ArtistArt = $searchNode->getAttribute('ArtistArt');
$size = $searchNode->getAttribute('size');
}
}
I get this error:
'Warning: DOMDocument::load(): I/O warning : failed to load external entity /var/www/html/http:/api.anghami.com/rest/v1/GETsearch.view?sid=11754134061397734622103190992&query=Can't Remember to Forget You Shakira&searchtype=SONG&ook&songCount=1" in /var/www/html/search.php on line 93'
Can anyone help please?
#Fracsi is correct: the URL needs to start with http:// not http:/
The other problem is that the XML has a default namespace (defined with the xmlns attribute on the root element), so you need to use
$data->getElementsByTagNameNS('http://api.anghami.com/rest/v1', 'song')
to select all the "song" elements.
I'm trying to write a droid app that sends and receives XML between the app and a web service. When I try to run the following code
$dom = new domDocument;
$dom = simplexml_load_file('php://input');
$xml = simplexml_import_dom($dom);
$messages = Messages::find_by_sql("SELECT * FROM messages WHERE reciever = '$xml->userName'");
$xmlString = "";
if($messages)
{
foreach($messages as $message)
{
$ts = strtotime($message->ts);
$xmlString=$xmlString."<Message><sender>".$message->sender."</sender><reciever>".$message->reciever."</reciever><timestamp>"."123"."</timestamp><text>".$message->text."</text></Message>";
}
}
else
{
//do something
}
$xmlReturn = new DOMDocument('1.0', 'UTF-8');
$xmlReturn->loadXML($xmlString);
echo($xmlReturn->saveXML());
?>
I get a Warning Extra content at the end of the document.
The error comes from this line: $xmlReturn->loadXML($xmlString);
I'm not 100% sure that you can create an xml document by loading a string, but I've seen similar things done and if you look here you can see what it ouputs, which looks like valid XML to me.
An XML document can have only one root element. You are stringing together multiple <message>…</message> combinations here, so a root element encapsulating those is missing.
I am using the PHP Simple DOM parser to extract all of the image sources on a given page like so:
// Include the library
include('simple_html_dom.php');
// Retrieve the DOM from a given URL
$html = file_get_html('http://google.com/');
// Retrieve all images and print their SRCs
foreach($html->find('img') as $e)
echo $e->src . '<br>';
Instead of using Google.com, I wish to use a page on Wordpress's admin (backend) area. These pages are PHP pages, not HTML (but the page has standard HTML throughout). How would I use the current page as the $html variable? PHP newbie over here.
Using this library dxtool found here.
Login
require 'WebGet.php';
$w = new WebGet();
// using cache to prevent repetitive download
$w->useCache = true;
$w->cacheLocation = '/tmp';
$w->cacheMaxAge = 3600;
$w->cookieFile = '/tmp/cookie.txt';
// $login_get_data and $login_post_data is associative array
$login = $w->requestContent($login_url, $login_get_data, $login_post_data);
Visiting Image containing page
// $image_page_url is the url of the page where your images exist.
$image_page = $w->requestContent($image_page_url);
Parse images and display
$dom = new DOMDocument();
$dom->loadHTML($image_page);
$imgs = $dom->getElementsByTagName("img");
foreach($imgs as $img){
echo $img->getAttribute("src");
}
Disclaimer: I am the author of this class
I need to fetch the image from a remote page, i tried xpath but i was told it wont work because img does not have nodevalue
Then i was advised to use getAttribute, but i dont know how to get it working.
Any suggestions?
This is my code
<?php
libxml_use_internal_errors(true);
//Setting content type to xml!
header('Content-type: application/xml');
//POST Field name is bWV0aG9k
$url_prefix = $_GET['bWV0aG9k'];
$url_http_request_encode = strpos($url_prefix, "http://");
//Checking to see if url has a http prefix
if($url_http_request_encode === false){
//does not have, add it!
$fetchable_url_link_consistancy_remote_data = "http://".$url_prefix;
}
else
//has it, do nothing
{
$fetchable_url_link_consistancy_remote_data = $url_prefix;
}
//Creating a new DOM Document on top of pre-existing one
$page = new DOMDocument();
//Loading the requested file
$page->loadHTMLFile($fetchable_url_link_consistancy_remote_data);
//Initliazing xpath
$xpath = new DOMXPath($page);
//Search parameters
//Searching for title attribute
$query = "//title";
//Searching for paragraph attribute
$query1 = "//p";
//Searching for thumbnails
$query2 = "//img";
//Binding the attributes to xpath for later use
$title = $xpath->query($query);
$paragraph = $xpath->query($query1);
$images = $xpath->query($query2);
echo "<remotedata>";
//Echoing the attributes
echo "<title-render>".$title->item(0)->nodeValue."</title-render>";
echo "<paragraph>".$paragraph->item(0)->nodeValue."</paragraph>";
echo "<image_link>".$images->item(0)->nodeValue."</image_link>";
echo "</remotedata>";
?>
you should get source attribute of an image tag.
$images->item(0)->getAttribute('src');
if this is normal xhtml, img has no value, you need the value of img/#src
Whats not known how to do properly is the following...
$attendXml = "";
for ($i=0;$i<count($attendData);$i++) {
$attendXml += assocArrayToXML('row',$attendData[$i]);
}
I have written wrong but I think you see what I am trying to do, the code comes from the following program. Retrieving the organXml works okay, the problem occurs with an array (none associative) containing a number of (associative arrays) that's the problem.
How do I merge the XML of each of the associative arrays into one XML differentiated by 'row'.
function assocArrayToXML($root_element_name,$ar)
{
$xml = new SimpleXMLElement("<?xml version=\"1.0\"?><{$root_element_name}> </{$root_element_name}>");
$f = create_function('$f,$c,$a','
foreach($a as $k=>$v) {
if(is_array($v)) {
$ch=$c->addChild($k);
$f($f,$ch,$v);
} else {
$c->addChild($k,$v);
}
}');
$f($f,$xml,$ar);
return $xml->asXML();
}
// Include Libraries
include('services\OrganisationService.php');
include('services\AttendeeService.php');
// Target Organisation
$organ_id = 1;
// Read Organisation Data
$organServ = new OrganisationService();
$organData = $organServ->getOrganisationByID($organ_id);
$organXml = assocArrayToXML('organisation',$organData);
// Read Attendees Data (For Organisation)
$attendServ = new AttendeeService();
$attendData = $attendServ->getAllActiveAttendeeByOrg($organ_id);
$attendXml = "";
for ($i=0;$i<count($attendData);$i++) {
$attendXml += assocArrayToXML('row',$attendData[$i]);
}
//var_dump($attendData);
header ("Content-Type:text/xml");
echo $attendXml;
?>
You need to separate the creation of a nodes in the tree based on an associative array from the creation of the entire xml document string. I also recommend against using create_function to define a recursive function, and instead consider creating a class for handling the XML rendering.