I am currently working on a project to scrape data from a website. I have scraped all the data I need with the following code, however I would like to know what is the easiest way to output this data into a comma-delimited CSV file. I had originally planned to move all of it into a table then export that way, but I know don't if that's the most efficient method.
<?php
$html = file_get_contents("http://www.zillow.com/homes/for_sale/Alamance-County-NC/2117_rid/36.181671,-78.943291,35.912411,-79.835243_rect/10_zm/1_fr/");
$DOM = new DOMDocument();
libxml_use_internal_errors(true);
$DOM->loadHTML($html);
$finder = new DomXPath($DOM);
$classname = 'property-address';
$nodes = $finder->query("//*[contains(#class, '$classname')]");
$csv_values = array();
foreach ($nodes as $node) {
$csv_values[] = $node->nodeValue;
}
$handle = fopen("C:\Users\Stephen\Documents\WorkCSV\work.csv", "w");
if (false !== $handle) {
fputcsv($handle, $csv_values);
}
?>
I was able to get all of my data into an array using the code provided by Dave. Also, in fopen I was using backslashes "\" , and after switching to forward slashes "/" I was able to produce an error I can work with for exporting to CSV.
You can use fputcsv and iterator_to_array:
$nodes = $finder->query("//*[contains(#class, '$classname')]");
$handle = fopen("/path/to/file.csv", "w");
if (false !== $handle) {
fputcsv($handle, iterator_to_array($nodes));
}
Or like this:
$csv_values = array();
foreach ($nodes as $node) {
$csv_values[] = $node->nodeValue;
}
$handle = fopen("/path/to/file.csv", "w");
if (false !== $handle) {
fputcsv($handle, $csv_values);
}
On the Windows platform, be careful to escape any backslashes used in the path to the file, or use forward slashes.
$handle = fopen("c:\\folder\\file.csv", "r");
Related
Hello I've got a bunch of divs I'm trying to scrape the content values from and I've managed to successfully pull out one of the values, result! However I've hit a brick wall, I want to now pull out the one after it inside the current code I've done. Hit a brick wall here would appreciate any help.
Here is the bit of code i'm currently using.
foreach ($arr as &$value) {
$file = $DOCUMENT_ROOT. $value;
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*[contains(#class, 'covGroupBoxContent')]//div[3]//div[2]");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
$maps = $node->nodeValue;
echo $maps;
}
}
}
}
I simply want them all to have separate outputs that I can echo out.
I recommend you use Simple HTML DOM. Beyond that I need to see a sample of the HTML you are scraping.
If you are scraping a website outside your domain I'd recommend saving the source HTML to a file for review and testing. Some websites combat scraping, thus what you see in the browser is not what your scraper would see.
Also, I'd recommend setting a random user agent via ini_set(). If you need a function for this I have one.
<?php
$html = file_get_html($url);
IF ($html) {
$myfile = fopen("testing.html", "w") or die("Unable to open file!");
fwrite($myfile, $html);
fclose($myfile);
}
?>
I have a large XML file more than 100 MB. I am reading the file in chunks like this
$fp = fopen('large.xml', 'r');
while ($data = fread($fp, 4096)) {
The format of XML is like this
<PersonalInfo>
<UserDetail>
<FirstName>ABC</FirstName>
<Occupation>Student</Occupation>
<DateOfBirth>08/14/1999</DateOfBirth>
</UserDetail>
<CaseDetail>....</CaseDetail>
<TransactionDetail>....</TransactionDetail>
</PersonalInfo>
<PersonalInfo>
<UserDetail>
<FirstName>XYZ</FirstName>
<Occupation>Student</Occupation>
<DateOfBirth>04/25/1991</DateOfBirth>
</UserDetail>
<CaseDetail>....</CaseDetail>
<TransactionDetail>.....</TransactionDetail>
</PersonalInfo>
<PersonalInfo>
<UserDetail>
<FirstName>DEF</FirstName>
<Occupation>Teacher</Occupation>
<DateOfBirth>05/12/1984</DateOfBirth>
</UserDetail>
<CaseDetail>....</CaseDetail>
<TransactionDetail>...</TransactionDetail>
</PersonalInfo>
I want to just include those records where the Occupation TAG is "Student" and write those results to a CSV file.
I have tried the preg_match as
preg_match( "/\(.*?)\</PersonalInfo>/s", $data, $match );
to select the Tags and then look into $match but it is returning double values(repetition).
First check if your xml is valid with the help of following link :
http://www.xmlformatter.net/
If your xml is valid then do following :
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->formatOutput = true;
#$dom->load('large.xml');
$tags = $dom->getElementsByTagName('PersonalInfo');
foreach ($tags as $destination) {
foreach($destination->childNodes as $child) {
if ($child->textContent == "Student") {
echo "Write code to create csv file";
}
}
}
I'm creating a tool that works with file strings and I need to get the line number where a node is found. It is, I have this:
$dom = new DOMDocument('1.0');
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//text()") as $q) {
// $line = WHAT???
$strings[trim($q->nodeValue)] = $line;
}
and I need to know in which line begins the string I'm storing in $strings array. Is it possible to get it?
Each DOMNode object has a getLineNo() function that returns this. In your case it's a DOMText object that extends from DOMNode:
foreach ($xpath->query("//text()") as $q) {
$line = $q->getLineNo();
$strings[trim($q->nodeValue)] = $line;
}
You might need to upgrade to PHP 5.3 if you have not yet to make use of that function.
I want to add/display data from querying from the database and add it into an XML file.
Example, I have a table_persons which has a name and age. I create a mysql query to get its name and age. Then simply put the data(name and age of persons) into an XML file.
How would you do that? Or is it possible?
I suggest you use DomDocument and file_put_contents to create your XML file.
Something like this:
// Create XML document
$doc = new DomDocument('1.0', 'UTF-8');
// Create root node
$root = $doc->createElement('persons');
$root = $doc->appendChild($root);
while ($row = mysql_fetch_assoc($result)) {
// add node for each row
$node = $doc->createElement('person');
$node = $root->appendChild($node);
foreach ($row as $column => $value) {
$columnElement = $doc->createElement($column);
$columnElement = $node->appendChild($columnElement);
$columnValue = $doc->createTextNode($value);
$columnValue = $columnElement->appendChild($columnValue);
}
}
// Complete XML document
$doc->formatOutput = true;
$xmlContent = $doc->saveXML();
// Save to file
file_put_contents('persons.xml', $xmlContent);
<?php
[snip] //database code here
$f = fopen('myxml.xml', 'a+');
foreach($row = mysqli_fetch_assoc($resultFromQuery))
{
$str = "<person>
<name>{$row['name']}</name>
<age>{$row['age']}</age>
</person>\n";
fwrite($f, $str);
}
fclose($f);
?>
Assuming you use mysqli, this code works. If not, suit to fit. In the fopen function call, the a+ tells it to open it for reading at writing, placing the pointer at the end of the file.
Best of luck.
$fp = fopen('data.txt', 'r');
$xml = new SimpleXMLElement('<allproperty></allproperty>');
while ($line = fgetcsv($fp)) {
if (count($line) < 4) continue; // skip lines that aren't full
$node = $xml->addChild('aproperty');
$node->addChild('postcode', $line[0]);
$node->addChild('price', $line[1]);
$node->addChild('imagefilename', $line[2]);
$node->addChild('visits', $line[3]);
}
echo $xml->saveXML();
im using this script to convert text file into a xml file, but i want to output it to a file, how can i do this simpleXML, cheers
file_put_contents function would do it. The function take a filename and some content and save it to the file.
So retaking your example you would just to replace the echo statement by file_put_contents.
$xml = new SimpleXMLElement('<allproperty></allproperty>');
$fp = fopen('data.txt', 'r');
while ($line = fgetcsv($fp)) {
if (count($line) < 4) continue; // skip lines that aren't full
$node = $xml->addChild('aproperty');
$node->addChild('postcode', $line[0]);
$node->addChild('price', $line[1]);
$node->addChild('imagefilename', $line[2]);
$node->addChild('visits', $line[3]);
}
file_put_contents('data_out.xml',$xml->saveXML());
For the record, you can use asXML() for that. I mean, it's right there in the manual, just read it and your life will get easier. (I assume, perhaps asking StackOverflow for basic stuff is easier for some)
Also, and this one is more circumstantial, you don't necessarily need to use addChild() for every child. If there is no child of that name, it can be assigned directly using the object property notation:
$fp = fopen('data.txt', 'r');
$xml = new SimpleXMLElement('<allproperty />');
while ($line = fgetcsv($fp)) {
if (count($line) < 4) continue; // skip lines that aren't full
$node = $xml->addChild('aproperty');
$node->postcode = $line[0];
$node->price = $line[1];
$node->imagefilename = $line[2];
$node->visits = $line[3];
}
$xml->asXML('data.xml');