Whats not known how to do properly is the following...
$attendXml = "";
for ($i=0;$i<count($attendData);$i++) {
$attendXml += assocArrayToXML('row',$attendData[$i]);
}
I have written wrong but I think you see what I am trying to do, the code comes from the following program. Retrieving the organXml works okay, the problem occurs with an array (none associative) containing a number of (associative arrays) that's the problem.
How do I merge the XML of each of the associative arrays into one XML differentiated by 'row'.
function assocArrayToXML($root_element_name,$ar)
{
$xml = new SimpleXMLElement("<?xml version=\"1.0\"?><{$root_element_name}> </{$root_element_name}>");
$f = create_function('$f,$c,$a','
foreach($a as $k=>$v) {
if(is_array($v)) {
$ch=$c->addChild($k);
$f($f,$ch,$v);
} else {
$c->addChild($k,$v);
}
}');
$f($f,$xml,$ar);
return $xml->asXML();
}
// Include Libraries
include('services\OrganisationService.php');
include('services\AttendeeService.php');
// Target Organisation
$organ_id = 1;
// Read Organisation Data
$organServ = new OrganisationService();
$organData = $organServ->getOrganisationByID($organ_id);
$organXml = assocArrayToXML('organisation',$organData);
// Read Attendees Data (For Organisation)
$attendServ = new AttendeeService();
$attendData = $attendServ->getAllActiveAttendeeByOrg($organ_id);
$attendXml = "";
for ($i=0;$i<count($attendData);$i++) {
$attendXml += assocArrayToXML('row',$attendData[$i]);
}
//var_dump($attendData);
header ("Content-Type:text/xml");
echo $attendXml;
?>
You need to separate the creation of a nodes in the tree based on an associative array from the creation of the entire xml document string. I also recommend against using create_function to define a recursive function, and instead consider creating a class for handling the XML rendering.
Related
I am trying to scrape a remote website and edit parts of the results before updating a couple of tables in the database and subsequently echo()'ing the final document.
Here's a redacted snippet of the code in question for reference:
<?php
require_once 'backend/connector.php';
require_once 'table_access/simplehtmldom_1_5/simple_html_dom.php';
require_once 'pronunciation1.php';
// retrieve lookup term
if(isset($_POST["lookup_term"])){ $term = trim($_POST["lookup_term"]); }
else { $term = "hombre"; }
$html = file_get_html("http://www.somesite.com/translate/" . rawurlencode($term));
$coll_temp = $html->find('div[id=translate-en]');
$announce = $coll_temp[0]->find('.announcement');
$quickdef = $coll_temp[0]->find('.quickdef');
$meaning = $announce[0] . $quickdef[0];
$html->clear(); // release scraper variable to prevent memory leak issues
unset($html); // release scraper variable to prevent memory leak issues
$meaning = '<?xml version="1.0" encoding="ISO-8859-15"?>' . $meaning;
// process the newly-created DOM
$dom = new DOMDocument;
$dom->loadHTML($meaning);
// various DOM-manipulation code snippets
// extract the quick definition section
foreach ($dom->find('div[class=quickdef]') as $qdd) {
$qdh1 = $qdd->find('.source')[0]->find('h1.source-text');
$qdterm = $qdh1[0]->plaintext;
$qdlang = $qdh1[0]->getAttribute('source-lang');
add2qd($qdterm, $qdd, $qdlang);
unset($qdterm);
unset($qdlang);
unset($qdh1);
}
$finalmeaning = $dom->saveHTML(); // store processed DOM in $finalmeaning
push2db($term, $finalmeaning); // add processed DOM to database
echo $finalmeaning; // output processed DOM
// release variables
unset($dom);
unset($html);
unset($finalmeaning);
function add2qd($lookupterm, $finalqd, $lang){
$connect = dbconn(PROJHOST, CONTEXTDB, PEPPYUSR, PEPPYPWD);
$sql = 'INSERT IGNORE INTO tblquickdef (word, quickdef, lang) VALUES (:word, :quickdef, :lang)';
$query = $connect->prepare($sql);
$query->bindParam(':word', $lookupterm);
$query->bindParam(':quickdef', $finalqd);
$query->bindParam(':lang', $lang);
$query->execute();
$connect = null;
}
function push2db($lookupword, $finalmean) {
$connect = dbconn(PROJHOST, DICTDB, PEPPYUSR, PEPPYPWD);
$sql = 'INSERT IGNORE INTO tbldict (word, mean) VALUES (:word, :mean)';
$query = $connect->prepare($sql);
$query->bindParam(':word', $lookupword);
$query->bindParam(':mean', $finalmean);
$query->execute();
$connect = null;
}
?>
The code works fine except for the for loop under the // extract the quick definition section. The function being called from inside this loop is add2qd() which accepts 3 string values as input.
Every time this loop runs, PHP throws a fatal error because it thinks find() is undefined. I know find is a legitimate function in the PHP Simple HTML DOM Parser library because I have used it multiple times in the same code without any problem (in the //retrieve lookup term section). What am I doing wrong?
But your are not using the PHP Simple HTML DOM - only standard PHP DOMDocument, which does not have the method find.
$dom = new DOMDocument;
$dom->loadHTML($meaning);
foreach ($dom->find('div[class=quickdef]') as $qdd) {
http://php.net/manual/en/class.domdocument.php
I have a program that removes certain pages from a web; i want to then traverse the remaining pages and "unlink" any links to those removed pages. I'm using simplehtmldom. My function takes a source page ($source) and an array of pages ($skipList). It finds the links, and I'd like to then manipulate the dom to convert the element into the $link->innertext, but I don't know how. Any help?
function RemoveSpecificLinks($source, $skipList) {
// $source is the html source file;
// $skipList is an array of link destinations (hrefs) that we want unlinked
$docHtml = file_get_contents($source);
$htmlObj = str_get_html($docHtml);
$links = $htmlObj->find('a');
if (isset($links)) {
foreach ($links as $link) {
if (in_array($link->href, $skipList)) {
$link->href = ''; // Should convert to simple text element
}
}
}
$docHtml = $htmlObj->save();
$htmlObj->clear();
unset($htmlObj);
return($docHtml);
}
I have never used simplehtmldom, but this is what I think should solve your problem:
function RemoveSpecificLinks($source, $skipList) {
// $source is the HTML source file;
// $skipList is an array of link destinations (hrefs) that we want unlinked
$docHtml = file_get_contents($source);
$htmlObj = str_get_html($docHtml);
$links = $htmlObj->find('a');
if (isset($links)) {
foreach ($links as $link) {
if (in_array($link->href, $skipList)) {
$link->outertext = $link->plaintext; // THIS SHOULD WORK
// IF THIS DOES NOT WORK TRY:
// $link->outertext = $link->innertext;
}
}
}
$docHtml = $htmlObj->save();
$htmlObj->clear();
unset($htmlObj);
return($docHtml);
}
Please provide me some feedback as if this worked or not, also specifying which method worked, if any.
Update: Maybe you would prefer this:
$link->outertext = $link->href;
This way you get the link displayed, but not clickable.
I Have a HTML Table
My Parsing Code is
$src = new DOMDocument('1.0', 'utf-8');
$src->formatOutput = true;
$src->preserveWhiteSpace = false;
#$src->loadHTML($result);
$xpath = new DOMXPath($src);
$data=$xpath->query('//td[ contains (#class, "bodytext1") ]');
foreach($data as $datas)
{
echo $datas->nodeValue."<br />";
}
$values=$xpath->query('//tr[ contains (#bgcolor, "f3fafe") ]');
foreach($values as $value)
{
echo $value->nodeValue."<br />";
}
$values1=$xpath->query('//tr[ contains (#bgcolor, "def0fa") ]');
foreach($values1 as $value1)
{
echo $value1->nodeValue."<br />";
}
to be printed, and I want them to be repeated along with other lines as shown above in output i need.
and I want this whole thing in a array so that i can insert it in the database
Can anyone please guide me or give me any hint so that I can do this
This should get you started.
$src = new DOMDocument('1.0', 'utf-8');
$src->formatOutput = true;
$src->preserveWhiteSpace = false;
$src->loadHTML($result);
$xpath = new DOMXPath($src);
// get header data
$data=$xpath->query('//table[1]//td');
$htno = trim(explode(":",$data->item(0)->nodeValue)[1]);
$name = trim(explode(":",$data->item(1)->nodeValue)[1]);
$fatherName=trim(explode(":",$data->item(2)->nodeValue)[1]);
// rows from 2nd table
$values1=$xpath->query('//table[2]//tr');
$header = true; // flag to track whether we've read the header row.
foreach($values1 as $value1)
{
if (!$header) {
$rowdata = str_replace("\r\n"," ",$value1->nodeValue);
echo $htno," ",$name," ",$fatherName," ",$rowdata,"\n";
}
$header = false;
}
Note:
The $header flag is a quick fix. A better Xpath query might eliminate the need for it.
the str_replace near the bottom is ugly but expedient. You might want to play with the xpath query to see if you can improve it.
Output is not formatted for HTML - lines are delimited by \n
I got a warning on one line where it contained &, so I changed it to AND. You might have to preprocess your tables to eliminate those somehow.
you could use third party's dll,such as "Html Agility Pack". a tool which is professional to convert html into xml.
I have a page test.php in which I have a list of names:
name1: 992345
name2: 332345
name3: 558645
name4: 434544
In another page test1.php?id=name2 and the result should be:
332345
I've tried this PHP code:
<?php
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile("/test.php");
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*#".$_GET["id"]."");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
?>
I need to be able to change the name with GET PHP method in test1.pdp?id=name4
The result should be different now.
434544
is there another way, becose mine won't work?
Here is another way to do it.
<?php
libxml_use_internal_errors(true);
/* file function reads your text file into an array. */
$doc = file("test.php");
$id = $_GET["id"];
/* Show your array. You can remove this part after you
* are sure your text file is read correct.*/
echo "Seeking id: $id<br>";
echo "Elements:<pre>";
print_r($doc);
echo "</pre>";
/* this part is searching for the get variable. */
if (!is_null($doc)) {
foreach ($doc as $line) {
if(strpos($line,$id) !== false){
$search = $id.": ";
$replace = '';
echo str_replace($search, $replace, $line);
}
}
} else {
echo "No elements.";
}
?>
There is a completely different way to do this, using PHP combined with JavaScript (not sure if that's what you're after and if it can work with your app, but I'm going to write it). You can change your test.php to read the GET parameter (it can be POST as well, you'll see), and according to that, output only the desired value, probably from the associative array you have hard-coded in there. The JavaScript approach will be different and it would involve making a single AJAX call instead of DOM traversing using PHP.
So, in short: AJAX call to test.php, which then output the desired value based on the GET or POST parameter.
jQuery AJAX here; native JS tutorial here.
Just let me know if this won't work for your app, and I'll delete my answer.
I have made this:
<html>
<head>
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script>
$(document).ready(
function()
{
$("body").html($("#HomePageTabs_cont_3").html());
}
);
</script>
</head>
<body>
<?php
echo file_get_contents("http://www.bankasya.com.tr/index.jsp");
?>
</body>
</html>
When I check my page with Firebug, It gives countless "missing files" (images, css files, js files, etc.) errors. I want to have just a part of the page not of all. This code does what I want. But I am wondering if there is a better way.
EDIT:
The page does what I need. I do not need all the contents. So iframe is useless to me. I just want the raw data of the div #HomePageTabs_cont_3.
Your best bet is PHP server-side parsing. I have written a small snippet to show you how to do this using DOMDocument (and possibly tidyif your server has it, to barf out all the mal-formed XHTML foos).
Caveat: outputs UTF-8. You can change this in the constructor of DOMDocument
Caveat 2: WILL barf out if its input is neither utf-8 not iso-8859-9. The current page's charset is iso-8859-9 and I see no reason why they would change this.
header("content-type: text/html; charset=utf-8");
$data = file_get_contents("http://www.bankasya.com.tr/index.jsp");
// Clean it up
if (class_exists("tidy")) {
$dataTidy = new tidy();
$dataTidy->parseString($data,
array(
"input-encoding" => "iso-8859-9",
"output-encoding" => "iso-8859-9",
"clean" => 1,
"input-xml" => true,
"output-xml" => true,
"wrap" => 0,
"anchor-as-name" => false
)
);
$dataTidy->cleanRepair();
$data = (string)$dataTidy;
}
else {
$do = true;
while ($do) {
$start = stripos($data,'<script');
$stop = stripos($data,'</script>');
if ((is_numeric($start))&&(is_numeric($stop))) {
$s = substr($data,$start,$stop-$start);
$data = substr($data,0,$start).substr($data,($stop+strlen('</script>')));
} else {
$do = false;
}
}
// nbsp breaks it?
$data = str_replace(" "," ",$data);
// Fixes for any element that requires a self-closing tag
if (preg_match_all("/<(link|img)([^>]+)>/is",$data,$mt,PREG_SET_ORDER)) {
foreach ($mt as $v) {
if (substr($v[2],-1) != "/") {
$data = str_replace($v[0],"<".$v[1].$v[2]."/>",$data);
}
}
}
// Barf out the inline JS
$data = preg_replace("/javascript:[^;]+/is","#",$data);
// Barf out the noscripts
$data = preg_replace("#<noscript>(.+?)</noscript>#is","",$data);
// Muppets. Malformed comment = one more regexp when they could just learn to write proper HTML...
$data = preg_replace("#<!--(.*?)--!?>#is","",$data);
}
$DOM = new \DOMDocument("1.0","utf-8");
$DOM->recover = true;
function error_callback_xmlfunction($errno, $errstr) { throw new Exception($errstr); }
$old = set_error_handler("error_callback_xmlfunction");
// Throw out all the XML namespaces (if any)
$data = preg_replace("#xmlns=[\"\']?([^\"\']+)[\"\']?#is","",(string)$data);
try {
$DOM->loadXML(((substr($data, 0, 5) !== "<?xml") ? '<?xml version="1.0" encoding="utf-8"?>' : "").$data);
} catch (Exception $e) {
$DOM->loadXML(((substr($data, 0, 5) !== "<?xml") ? '<?xml version="1.0" encoding="iso-8859-9"?>' : "").$data);
}
restore_error_handler();
error_reporting(E_ALL);
$DOM->substituteEntities = true;
$xpath = new \DOMXPath($DOM);
echo $DOM->saveXML($xpath->query("//div[#id=\"HomePageTabs_cont_3\"]")->item(0));
In order of appearance:
Fetch the data
If we have tidy, sanitize HTML with it
Create a new DOMDocument and load our document ((string)$dataTidy is a short-hand tidy getter)
Create an XPath request path
Use XPath to request all divs with id set as what we want, get the first item of the collection (->item(0), which will be a DOMElement) and request for the DOM to output its XML content (including the tag itself)
Hope it is what you're looking for... Though you might want to wrap it in a function.
Edit
Forgot to mention: http://rescrape.it/rs.php for the actual script output!
Edit 2
Correction, that site is not W3C-valid, and therefore, you'll either need to tidy it up or apply a set of regular expressions to the input before processing. I'm going to see if I can formulate a set to barf out the inconsistencies.
Edit 3
Added a fix for all those of us who do not have tidy.
Edit 4
Couldn't resist. If you'd actually like the values rather than the table, use this instead of the echo:
$d = new stdClass();
$rows = $xpath->query("//div[#id=\"HomePageTabs_cont_3\"]//tr");
$rc = $rows->length;
for ($i = 1; $i < $rc-1; $i++) {
$cols = $xpath->query($rows->item($i)->getNodePath()."/td");
$d->{$cols->item(0)->textContent} = array(
((float)$cols->item(1)->textContent),
((float)$cols->item(2)->textContent)
);
}
I don't know about you, but for me, data works better than malformed tables.
(Welp, that one took a while to write)
I'd get in touch with the remote site's owner and ask if there was a data feed I could use that would just return the content I wanted.
Sébastien answer is the best solution, but if you want to use jquery you can add Base tag in head section of your site to avoid not found errors on images.
<base href="http://www.bankasya.com.tr/">
Also you will need to change your sources to absolute path.
But use DOMDocument