I have following code.
<entry>
<job:location>
<job:id>24</job:id>
<job:region>6</job:region>
</job:location>
</entry>
I've problem with namespaces. How I can read content of job:region tag in SimpleXML.
Try this:
<?php
$entry = simplexml_load_file('entry.xml');
printf("%s\n", $entry->children('job', true)->location->region);
?>
To check the above code in action, click here
For more information about SimpleXml refer to this article
You should register the job namespace, then you can use the registered namespace-prefix in an XPath to select what you want:
$sxe = new SimpleXMLElement($xml);
$sxe->registerXPathNamespace('job', 'http://example.org/you-did-not-provide-the-job-namespaceURI-in-your-example');
$result = $sxe->xpath('//entry/job:location/job:region');
foreach ($result as $location) {
echo $location . "\n";
}
I would do it dynamically.
$xml = #simplexml_load_string($path) // loads your valid xml data
foreach($xml->channel->item as $entry) {
$namespaces = $entry->getNameSpaces(true);
foreach($namespaces as $ns=>$value)
{
$job = $entry->children($namespaces[$ns]);
$author = (string)$job->creator;
if ($author != "")
{
$someVariable = (string) $dc->creator;
}
}
Related
I am trying to scrape this webpage. In this webpage I have to get the job title and its location. Which I am able to get from my code. But the problem is coming that when I am sending it in XML, then only one detail is going from the array list.
I am using goutte CSS selector library and also please tell me how to scrap pagination in goutte CSS selector library.
here is my code:
$httpClient = new \Goutte\Client();
$response = $httpClient->request('GET', 'https://www.simplyhired.com/search?q=pharmacy+technician&l=American+Canyon%2C+CA&job=X5clbvspTaqzIHlgOPNXJARu8o4ejpaOtgTprLm2CpPuoeOFjioGdQ');
$job_posting_location = [];
$response->filter('.LeftPane article .SerpJob-jobCard.card .jobposting-subtitle span.JobPosting-labelWithIcon.jobposting-location span.jobposting-location')
->each(function ($node) use (&$job_posting_location) {
$job_posting_location[] = $node->text() . PHP_EOL;
});
$joblocation = 0;
$response->filter('.LeftPane article .SerpJob-jobCard.card .jobposting-title-container h3 a')
->each( function ($node) use ($job_posting_location, &$joblocation, $httpClient) {
$job_title = $node->text() . PHP_EOL; //job title
$job_posting_location = $job_posting_location[$joblocation]; //job posting location
// display the result
$items = "{$job_title} # {$job_posting_location}\n\n";
global $results;
$result = explode('#', $items);
$results['job_title'] = $result[0];
$results['job_posting_location'] = $result[1];
$joblocation++;
});
function convertToXML($results, &$xml_user_info){
foreach($results as $key => $value){
if(is_array($value)){
$subnode = $xml_user_info->addChild($key);
foreach ($value as $k=>$v) {
$xml_user_info->addChild("$k",htmlspecialchars("$v"));
}
}else{
$xml_user_info->addChild("$key",htmlspecialchars("$value"));
}
}
return $xml_user_info->asXML();
}
$xml_user_info = new SimpleXMLElement('<root/>');
$xml_content = convertToXML($results,$xml_user_info);
$xmlFile = 'details.xml';
$handle = fopen($xmlFile, 'w') or die('Unable to open the file: '.$xmlFile);
if(fwrite($handle, $xml_content)) {
echo 'Successfully written to an XML file.';
}
else{
echo 'Error in file generating';
}
what i got in xml file --
<?xml version="1.0"?>
<root><job_title>Pharmacy Technician
</job_title><job_posting_location> Vallejo, CA
</job_posting_location></root>
what i want in xml file --
<?xml version="1.0"?>
<root>
<job_title>Pharmacy Technician</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
<job_title>Pharmacy Technician 1</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
<job_title>Pharmacy Technician New</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
and so on...
</root>
You overwrite the values in the $results variable. You're would need to do something like this to append:
$results[] = [
'job_title' => $result[0];
'job_posting_location' => $result[1]
];
However here is no need to put the data into an array at all, just create the
XML directly with DOM.
Both your selectors share the same start. Iterate the card and then fetch
related data.
$httpClient = new \Goutte\Client();
$response = $httpClient->request('GET', $url);
$document = new DOMDocument();
// append document element node
$postings = $document->appendChild($document->createElement('jobs'));
// iterate job posting cards
$response->filter('.LeftPane article .SerpJob-jobCard.card')->each(
function($jobCard) use ($document, $postings) {
// fetch data
$location = $jobCard
->filter(
'.jobposting-subtitle span.JobPosting-labelWithIcon.jobposting-location span.jobposting-location'
)
->text();
$title = $jobCard->filter('.jobposting-title-container h3 a')->text();
// append 'job' node to group data in result
$job = $postings->appendChild($document->createElement('job'));
// append data nodes
$job->appendChild($document->createElement('job_title'))->textContent = $title;
$job->appendChild($document->createElement('job_posting_location'))->textContent = $location;
}
);
echo $document->saveXML();
I've been trying unsuccessfully with PHP to loop through two XML files and print the result to the screen. The aim is to take a country's name and output its regions/states/provinces as the case may be.
The first block of code successfully prints all the countries but the loop through both files gives me a blank screen.
The countries file is in the format:
<row>
<id>6</id>
<name>Andorra</name>
<iso2>AD</iso2>
<phone_code>376</phone_code>
</row>
And the states.xml:
<row>
<id>488</id>
<name>Andorra la Vella</name>
<country_id>6</country_id>
<country_code>AD</country_code>
<state_code>07</state_code>
</row>
so that country_id = id.
This gives a perfect list of countries:
$xml = simplexml_load_file("countries.xml");
$xml1 = simplexml_load_file("states.xml");
foreach($xml->children() as $key => $children) {
print((string)$children->name); echo "<br>";
}
This gives me a blank screen except for the HTML stuff on the page:
$xml = simplexml_load_file("countries.xml");
$xml1 = simplexml_load_file("states.xml");
$s = "Jamaica";
foreach($xml->children() as $child) {
foreach($xml1->children() as $child2){
if ($child->id == $child2->country_id && $child->name == $s) {
print((string)$child2->name);
echo "<br>";
}
}
}
Where have I gone wrong?
Thanks.
I suspect your problem is not casting the name to a string before doing your comparison. But why are you starting the second loop before checking if it's needed? You're looping through every single item in states.xml needlessly.
$countries = simplexml_load_file("countries.xml");
$states = simplexml_load_file("states.xml");
$search = "Jamaica";
foreach($countries->children() as $country) {
if ((string)$country->name !== $search) {
continue;
}
foreach($states->children() as $state) {
if ((string)$country->id === (string)$state->country_id) {
echo (string)$state->name . "<br/>";
}
}
}
Also, note that naming your variables in a descriptive manner makes it much easier to figure out what's going on with code.
You could probably get rid of the loops altogether using an XPath query to match the sibling value. I don't use SimpleXML, but here's what it would look like with DomDocument:
$search = "Jamaica";
$countries = new DomDocument();
$countries->load("countries.xml");
$xpath = new DomXPath($countries);
$country = $xpath->query("//row[name/text() = '$search']/id/text()");
$country_id = $country[0]->nodeValue;
$states = new DomDocument();
$states->load("states.xml");
$xpath = new DomXPath($states);
$states = $xpath->query("//row[country_id/text() = '$country_id']/name/text()");
foreach ($states as $state) {
echo $state->nodeValue . "<br/>";
}
I am parsing the following RSS feed (relevant part shown)
<item>
<title>xxx</title>
<link>xxx</link>
<guid>xxx</guid>
<description>xxx</description>
<prx:proxy>
<prx:ip>101.226.74.168</prx:ip>
<prx:port>8080</prx:port>
<prx:type>Anonymous</prx:type>
<prx:ssl>false</prx:ssl>
<prx:check_timestamp>1369199066</prx:check_timestamp>
<prx:country_code>CN</prx:country_code>
<prx:latency>20585</prx:latency>
<prx:reliability>9593</prx:reliability>
</prx:proxy>
<prx:proxy>...</prx:proxy>
<prx:proxy>...</prx:proxy>
<pubDate>xxx</pubDate>
</item>
<item>...</item>
<item>...</item>
<item>...</item>
Using the php code
$proxylist_rss = file_get_contents('http://www.xxx.com/xxx.xml');
$proxylist_xml = new SimpleXmlElement($proxylist_rss);
foreach($proxylist_xml->channel->item as $item) {
var_dump($item); // Ok, Everything marked with xxx
var_dump($item->title); // Ok, title
foreach($item->proxy() as $entry) {
var_dump($entry); //empty
}
}
While I can access everything marked with xxx, I cannot access anything inside prx:proxy - mainly because : cannot be present in valid php varnames.
The question is how to reach prx:ip, as example.
Thanks!
Take a look at SimpleXMLElement::children, you can access the namespaced elements with that.
For example: -
<?php
$xml = '<xml xmlns:prx="http://example.org/">
<item>
<title>xxx</title>
<link>xxx</link>
<guid>xxx</guid>
<description>xxx</description>
<prx:proxy>
<prx:ip>101.226.74.168</prx:ip>
<prx:port>8080</prx:port>
<prx:type>Anonymous</prx:type>
<prx:ssl>false</prx:ssl>
<prx:check_timestamp>1369199066</prx:check_timestamp>
<prx:country_code>CN</prx:country_code>
<prx:latency>20585</prx:latency>
<prx:reliability>9593</prx:reliability>
</prx:proxy>
</item>
</xml>';
$sxe = new SimpleXMLElement($xml);
foreach($sxe->item as $item)
{
$proxy = $item->children('prx', true)->proxy;
echo $proxy->ip; //101.226.74.169
}
Anthony.
I would just strip out the "prx:"...
$proxylist_rss = file_get_contents('http://www.xxx.com/xxx.xml');
$proxylist_rss = str_replace('prx:', '', $proxylist_rss);
$proxylist_xml = new SimpleXmlElement($proxylist_rss);
foreach($proxylist_xml->channel->item as $item) {
foreach($item->proxy as $entry) {
var_dump($entry);
}
}
http://phpfiddle.org/main/code/jsz-vga
Try it like this:
$proxylist_rss = file_get_contents('http://www.xxx.com/xxx.xml');
$feed = simplexml_load_string($proxylist_rss);
$ns=$feed->getNameSpaces(true);
foreach ($feed->channel->item as $item){
var_dump($item);
var_dump($item->title);
$proxy = $item->children($ns["prx"]);
$proxy = $proxy->proxy;
foreach ($proxy as $key => $value){
var_dump($value);
}
}
Here is the URL of the xml source:
I'm tryng to grab all the RichText elements using xpath relative location and then print the elementID attribute. It is outputting nothing though. Any ideas?
<?php
$url = "FXG";
$xml = simplexml_load_file($url);
//print_r($xml);
$textNode = $xml->xpath("//RichText");
$count = count($textNode);
$i = 0;
while($i < $count)
{
echo '<h1>'.$textNode[$i]['s7:elementID'].'</h1>';
$i++;
}
?>
You need to register the namespaces that are set in the xml
$url = "http://testvipd7.scene7.com/is/agm/papermusepress/HOL_12_F_green?&fmt=fxgraw";
$xml = simplexml_load_file($url);
$xml->registerXPathNamespace('default', 'http://ns.adobe.com/fxg/2008');
$xml->registerXPathNamespace('s7', 'http://ns.adobe.com/S7FXG/2008');
$textNode = $xml->xpath("//default:RichText/#s7:elementID");
foreach($textNode as $node) {
echo '<h1>'.$node[elementID].'</h1>';
}
I hope this helps.
Strange. This, however, works.
$textNode = $xml->xpath("//*[name() = 'RichText']");
i'm struggling with Xpath, i have an xml list and i need to get the child data based on the parent id ...
My xml file :
<projecten>
<project id="1">
<titel>Shop 1</titel>
<siteurl>http://test.be</siteurl>
<screenshot>test.jpg</screenshot>
<omschrijving>comment 1</omschrijving>
</project>
<project id="2">
<titel>Shop 2</titel>
<siteurl>http://test2.be</siteurl>
<screenshot>test2.jpg</screenshot>
<omschrijving>comment</omschrijving>
</project>
</projecten>
the code i use to get for example the project 1 data (does not work):
$xmlDoc = new DOMDocument();
$xmlDoc->load(data.xml);
$xpath = new DOMXPath($xmlDoc);
$projectId = '1';
$query = '//projecten/project[#id='.$projectId.']';
$details = $xpath->query($query);
foreach( $details as $detail )
{
echo $detail->titel;
echo $detail->siteurl;
echo $detail->screenshot;
echo $detail->omschrijving;
}
But this does not show anything, if someone can point me out ... thanks
In addition to the solution already given you can also use:
foreach ($xpath->query(sprintf('/projecten/project[#id="%d"]', $id)) as $projectNode) {
echo
$projectNode->getElementsByTagName('titel')->item(0)->nodeValue,
$projectNode->getElementsByTagName('siteurl')->item(0)->nodeValue,
$projectNode->getElementsByTagName('screenshot')->item(0)->nodeValue,
$projectNode->getElementsByTagName('omschrijving')->item(0)->nodeValue;
}
or fetch the DOMText node values directly with Xpath
foreach ($xpath->query(sprintf('/projecten/project[#id="%d"]', $id)) as $projectNode) {
echo
$xpath->evaluate('string(titel)', $projectNode),
$xpath->evaluate('string(siteurl)', $projectNode),
$xpath->evaluate('string(screenshot)', $projectNode),
$xpath->evaluate('string(omschrijving)', $projectNode);
}
or import the node to SimpleXml
foreach ($xpath->query(sprintf('/projecten/project[#id="%d"]', $id)) as $projectNode) {
$detail = simplexml_import_dom($projectNode);
echo
$detail->titel,
$detail->siteurl,
$detail->screenshot,
$detail->omschrijving;
}
or even concatenate all the values directly in the XPath:
$xpath = new DOMXPath($dom);
echo $xpath->evaluate(
sprintf(
'concat(
/projecten/project[#id = %1$d]/titel,
/projecten/project[#id = %1$d]/siteurl,
/projecten/project[#id = %1$d]/screenshot,
/projecten/project[#id = %1$d]/omschrijving
', $id
)
);
Accessing the child nodes as you do:
echo $detail->title;
Is not valid, if you use DOM* functions. This would probably work if you were using SimpleXML.
For DOM* try this:
$dom = new DOMDocument;
$dom->loadXml('<projecten>
<project id="1">
<titel>Shop 1</titel>
<siteurl>http://test.be</siteurl>
<screenshot>test.jpg</screenshot>
<omschrijving>comment 1</omschrijving>
</project>
<project id="2">
<titel>Shop 2</titel>
<siteurl>http://test2.be</siteurl>
<screenshot>test2.jpg</screenshot>
<omschrijving>comment</omschrijving>
</project>
</projecten>
');
$id = 2;
$xpath = new DOMXPath($dom);
foreach ($xpath->query(sprintf('/projecten/project[#id="%s"]', $id)) as $projectNode) {
// repeat this for every needed node
$titleNode = $xpath->query('titel', $projectNode)->item(0);
if ($titleNode instanceof DOMElement) {
echo $titleNode->nodeValue;
}
// or us a loop for all child nodes
foreach ($projectNode->childNodes as $childNode) {
echo $childNode->nodeValue;
}
}