foreach ($filePaths as $filePath) {
/*Open a file, run a function to write a new file
that rewrites the information to meet design specifications */
$fileHandle = fopen($filePath, "r+");
$newHandle = new DOMDocument();
$newHandle->loadHTMLFile( $filePath );
$metaTitle = trim(retrieveTitleText($newHandle));
$pageMeta = array('metaTitle' => $metaTitle, 'pageTitle' => 'Principles of Biology' );
$attributes = retrieveBodyAttributes($filePath);
cleanfile($fileHandle, $filePath);
fclose($fileHandle);
}
function retrieveBodyAttributes($filePath) {
$dom = new DOMDocument;
$dom->loadHTMLFile($filePath);
$p = $dom->getElementsByTagName('body')->item(0);
/*if (!$p->hasAttribute('body')) {
$bodyAttr[] = array('attr'=>" ", 'value'=>" ");
return $bodyAttr;
}*/
if ($p->hasAttributes()) {
foreach ($p->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
$bodyAttr[] = array('attr'=>$name, 'value'=>$value);
}
return $bodyAttr;
}
}
$filePaths is an array of strings. When I run the code, it give me a "Call to member function hasAttributes() on non-object" error for the line that calls hasAttributes. When it's not commented out, I get the same error on the line that calls hasAttribute('body'). I tried a var_dump on $p, on the line just after the call to getElementsByTagName, and I got "object (DOMElement) [5]". Well, the number changed because I was running the code on multiple files at once, but I didn't know what the number meant. I can't find what I'm doing wrong.
with:
$p = $dom->getElementsByTagName('body')->item(0);
You are executing: DOMNodelist::item (See: http://www.php.net/manual/en/domnodelist.item.php) which returns NULL if, at the given index, no element is found.
But you're not checking for that possibility, you're just expecting $p to be not null.
Try adding something like:
if ($p instanceof DOMNode) {
// the hasAttributes code
}
Although, if you're sure that there should be a body element, you'll probably have to check your file paths.
It should be because there is no <body> tag in your DOM Document.
Related
include('simple_html_dom.php');
function curl_set($url){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
return $result;
}
$curl_scraped_page = curl_set('http://www.belmontwine.com/site-map.html');
$html = new simple_html_dom();
$html->load($curl_scraped_page, true, false);
$i = 0;
$ab = array();
$files = array();
foreach($html->find('td[class=site-map]') as $td) {
foreach($td->find('li a') as $a) {
if($i<=2){
$ab = 'http://www.belmontwine.com'.$a->href;
$html = file_get_html($ab);
foreach($html->find('td[class=pageheader]') as $file) {
$files[] = $file->innertext;
}
}
else{
//exit();
}
$i++;
}
$html->clear();
}
print_r($files);
Above is my code i need help to scrap site with php.
$ab variable contain the urls that are scraped from the site.i want to scrap data from those URL. I don't know whats wrong with script.
The desired output be the url passed by $ab..
but it is not returning anything..just a continous loop...
Need help with it
You have a run away program because once you are inside the if($i<=2) section you never increment the i variable. Right now your i++ is in the wrong place. I don't know why you want to limit the finds to 3 or less but you need to remember to reset the i variable to 0 also, which you are not doing at all.
EDIT:
I don't use the class 'simple_html_dom.php' so I don't know it very well. And I don't know what you want to do with each link found. And I can't do the work for you. I came up with this sample php script that grabs all the links from your site-map page. It creates an array consisting of the link title and href path. The last foreach loop just prints the array for now but you could use that loop to process each path found.
include('simple_html_dom.php');
$files = array();
$html = file_get_html('http://www.belmontwine.com/site-map.html');
foreach($html->find('td[class=site-map]') as $td)
{
foreach($td->find('li a') as $a)
{
if($a->plaintext != '')
{
$files["$a->plaintext"] = "http://www.belmontwine.com/$a->href";
}
}
}
// To print $files array or to process each link found
foreach($files as $title => $path)
{
echo('Title: ' . $title . ' - Path: ' . $path . '<br>' . PHP_EOL);
}
Also, not every link found is an html file, at least 1 is a pdf so be sure to test for that in your code.
I have a php file which is created to append data to a xml but if the file is not there it is supposed to create the file.But when I run the php it gives me output that I didn't expect which is added in the end of the question. XML file is not getting created. The following is the php file.
<?php
header('Content-Type: text/xml');
function createXML($doc) {
$markers = $doc->createElement('markers');
$markers = $doc->appendChild($markers);
}
$url = '../../data/data.xml';
$lat = "bla1";
$lng = "bla2";
$address = "bla3";
$doc = new DomDocument();
if (!file_exists($url)){
createXML($doc);
}
else {
$doc->preserveWhiteSpace = FALSE;
$doc->load($url);
}
$markers = $doc->getElementsByTagName('markers')->item(0);
$marker = $doc->createElement('marker');
$marker = $markers->appendChild($marker);
$marker->setAttribute("lat", $lat);
$marker->setAttribute("lng", $lng);
$marker->setAttribute("address", $address);
$doc->formatOutput = true;
$doc->save($url);
?>
when I run the php file I get the following output
Fatal error : Call to a member function appendChild() on a non-object in
/home[...]/saveMarkers.php on line 30
can you tell me what I have done wrong in here thank you in advance
I just tested your code is fine and doing what is expected, You don't need to use header if you don't want to echo
anyway verify you have permissions to write in the destination directory
I have several files to parse (with PHP) in order to insert their respective content in different database tables.
First point : the client gave me 6 files, 5 are CSV with values separated by coma ; The last one do not come from the same database and its content is tabulation-based.
I built a FileParser that uses SplFileObject to execute a method on each line of the file-content (basically, create an Entity with each dataset and persist it to the database, with Symfony2 and Doctrine2).
But I cannot manage to parse the tabulation-based text file with SplFileObject, it does not split the content in lines as I expect it to do...
// In my controller context
$parser = new MyAmazingFileParser();
$parser->parse($filename, $delimitor, function ($data) use ($em) {
$e = new Entity();
$e->setSomething($data[0);
// [...]
$em->persist($e);
});
// In my parser
public function parse($filename, $delimitor = ',', $run = null) {
if (is_callable($run)) {
$handle = new SplFileObject($filename);
$infos = new SplFileInfo($filename);
if ($infos->getExtension() === 'csv') {
// Everything is going well here
$handle->setCsvControl(',');
$handle->setFlags(SplFileObject::DROP_NEW_LINE + SplFileObject::READ_AHEAD + SplFileObject::SKIP_EMPTY + SplFileObject::READ_CSV);
foreach (new LimitIterator($handle, 1) as $data) {
$result = $run($data);
}
} else {
// Why does the Iterator-way does not work ?
$handle->setCsvControl("\t");
// I have tried with all the possible flags combinations, without success...
foreach (new LimitIterator($handle, 1) as $data) {
// It always only gets the first line...
$result = $run($data);
}
// And the old-memory-killing-dirty-way works ?
$fd = fopen($filename, 'r');
$contents = fread($fd, filesize($filename));
foreach (explode("\t", $contents) as $line) {
// Get all the line as I want... But it's dirty and memory-expensive !
$result = $run($line);
}
}
}
}
It is probably related with the horrible formatting of my client's file, but after a long discussion with them, they really cannot get another format for me, for some acceptable reasons (constraints in their side), unfortunately.
The file is currently long of 49459 lines, so I really think the memory is important at this step ; So I have to make the SplFileObject way working, but do not know how.
An extract of the file can be found here :
Data-extract-hosted
I have function build_additional_docs which calls another function that do few actions, but first it's call to function read_all_file, which extract the file to string variable and return it.
It's worked perfect when the function create_file_node has been called from another function.
but when it's called from build_additional_docs, the client wait to server untill time out...
I think that the function fail on fgets().
Additional comment: When I call function create_file_node whith with the same files, and the different is that file name is static string, and I have no foreach loop, the code works again...
here is my code:
function build_additional_docs($dir_name, $addDocsArr){
foreach ($addDocsArr as $doc) {
if($summery != ''){
$fileName = $dir_name . '\\' . $doc;
create_file_node($fileName);
}
}
function create_file_node($fileName){ global $base_url;
try{
$text = read_all_file($fileName);
}
catch (Exception $ex){
// some message here
}
return 0;
}
function read_all_file($file_name){
$file_handle = fopen($file_name, "r");
while (!feof($file_handle)) {
$line[] = fgets($file_handle);
}
fclose($file_handle);
return implode('',$line);
}
Found the mistake!
$addDocsArr variable is return value from explode() function for split string to seperated files names. The returned array include strings of file name with spacial characters that cannot be seen...
so when i add the code:
$fileName = $dir_name . '\\' . substr($doc, 0,strlen($doc) - 1);
the code worked.
I'm creating a tool that works with file strings and I need to get the line number where a node is found. It is, I have this:
$dom = new DOMDocument('1.0');
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//text()") as $q) {
// $line = WHAT???
$strings[trim($q->nodeValue)] = $line;
}
and I need to know in which line begins the string I'm storing in $strings array. Is it possible to get it?
Each DOMNode object has a getLineNo() function that returns this. In your case it's a DOMText object that extends from DOMNode:
foreach ($xpath->query("//text()") as $q) {
$line = $q->getLineNo();
$strings[trim($q->nodeValue)] = $line;
}
You might need to upgrade to PHP 5.3 if you have not yet to make use of that function.