Isolate a single item from XML feed with PHP

Isolate a single item from XML feed with PHP - php

Hi and thanks for looking.
I'm trying to get just the value in brackets after 1 GBP = USD from the following link, how do I do this with SimpleXML?
http://www.sloomedia.com/currency/feeds/GBP.xml
Thanks,
Ben.

You can try with xpath :
$xml = new SimpleXMLElement($string);
/* On cherche <item><title> */
$result = $xml->xpath('/item/title');
while(list( , $node) = each($result)) {
$matches = array();
preg_match('/\((.*)\)/', $node, $matches);
//do wathever you need with result
//echo $matches[1];
}
I haven't tested the regexp quickly written, you can find something better or use substr as the length of '1GBP = XXX' is constant in your file

In order to "isolate" (or rather, select) the right node, you can use the XPath function starts-with().
After selecting the node, you still have to parse its contents. For that, you can use any kind of string-manipulation function in PHP.
$rss = simplexml_load_file('http://www.sloomedia.com/currency/feeds/GBP.xml');
$nodes = $rss->xpath('//item[starts-with(title, "1 GBP = USD")]');
if (empty($nodes))
{
die('No GBP = USD nodes');
}
// Now you got the right node in $nodes[0], you just have to extract the value in parentheses
// The simple way -- 13 is the number of characters in "1 GBP = USD ("
$usd = substr($nodes[0]->title, 13, -1);
// The flexible way
if (!preg_match('#\\(([0-9]+\\.[0-9]+)\\)#', $nodes[0]->title, $m))
{
die('Could not parse the value');
}
$usd = $m[1];
// If you want to parse every item
foreach ($rss->channel->item as $item)
{
if (!preg_match('#1 ([A-Z]+) = ([A-Z]+) \\(([0-9]+\\.[0-9]+)\\)#', $item->title, $m))
{
echo 'Could not parse ', $item->title, "\n";
continue;
}
echo '1 ', $m[1], ' = ', $m[3], ' ', $m[2], "\n";
}

Related

Php variable into a XML request string

I have the below code wich is extracting the Artist name from a XML file with the ref asrist code.
<?php
$dom = new DOMDocument();
$dom->load('http://www.bookingassist.ro/test.xml');
$xpath = new DOMXPath($dom);
echo $xpath->evaluate('string(//Artist[ArtistCode = "COD Artist"] /ArtistName)');
?>
The code that is pulling the artistcode based on a search
<?php echo $Artist->artistCode ?>
My question :
Can i insert the variable generated by the php code into the xml request string ?
If so could you please advise where i start reading ...
Thanks

You mean the XPath expression. Yes you can - it is "just a string".
$expression = 'string(//Artist[ArtistCode = "'.$Artist->artistCode.'"]/ArtistName)'
echo $xpath->evaluate($expression);
But you have to make sure that the result is valid XPath and your value does not break the string literal. I wrote a function for a library some time ago that prepares a string this way.
The problem in XPath 1.0 is that here is no way to escape any special character. If you string contains the quotes you're using in XPath it breaks the expression. The function uses the quotes not used in the string or, if both are used, splits the string and puts the parts into a concat() call.
public function quoteXPathLiteral($string) {
$string = str_replace("\x00", '', $string);
$hasSingleQuote = FALSE !== strpos($string, "'");
if ($hasSingleQuote) {
$hasDoubleQuote = FALSE !== strpos($string, '"');
if ($hasDoubleQuote) {
$result = '';
preg_match_all('("[^\']*|[^"]+)', $string, $matches);
foreach ($matches[0] as $part) {
$quoteChar = (substr($part, 0, 1) == '"') ? "'" : '"';
$result .= ", ".$quoteChar.$part.$quoteChar;
}
return 'concat('.substr($result, 2).')';
} else {
return '"'.$string.'"';
}
} else {
return "'".$string."'";
}
}
The function generates the needed XPath.
$expression = 'string(//Artist[ArtistCode = '.quoteXPathLiteral($Artist->artistCode).']/ArtistName)'
echo $xpath->evaluate($expression);

find a element in html and explode it for stock

I want to retrieve an HTML element in a page.
<h2 id="resultCount" class="resultCount">
<span>
Showing 1 - 12 of 40,923 Results
</span>
</h2>
I have to get the total number of results for the test in my php.
For now, I get all that is between the h2 tags and I explode the first time with space.
Then I explode again with the comma to concatenate able to convert numbers results in European format. Once everything's done, I test my number results.
define("MAX_RESULT_ALL_PAGES", 1200);
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
$htmlResultCountPage = file_get_html($queryUrl);
$htmlResultCount = $htmlResultCountPage->find("h2[id=resultCount]");
$resultCountArray = explode(" ", $htmlResultCount[0]);
$explodeCount = explode(',', $resultCountArray[5]);
$europeFormatCount = '';
foreach ($explodeCount as $val) {
$europeFormatCount .= $val;
}
if ($europeFormatCount > MAX_RESULT_ALL_PAGES) {*/
$queryUrl = AMAZON_SEARCH_URL.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
}
At the moment the total number of results is not well recovered and the condition does not happen even when it should.
Someone would have a solution to this problem or any other way?

I would simply fetch the page as a string (not html) and use a regular expression to get the total number of results. The code would look something like this:
define('MAX_RESULT_ALL_PAGES', 1200);
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT . $searchMonthUrlParam . $searchYearUrlParam . $searchTypeUrlParam . urlencode($keyword) . '&page=' . $pageNum;
$queryResult = file_get_contents($queryUrl);
if (preg_match('/of\s+([0-9,]+)\s+Results/', $queryResult, $matches)) {
$totalResults = (int) str_replace(',', '', $matches[1]);
} else {
throw new \RuntimeException('Total number of results not found');
}
if ($totalResults > MAX_RESULT_ALL_PAGES) {
$queryUrl = AMAZON_SEARCH_URL . $searchMonthUrlParam . $searchYearUrlParam . $searchTypeUrlParam . urlencode($keyword) . '&page=' . $pageNum;
// ...
}

A regex would do it:
...
preg_match("/of ([0-9,]+) Results/", $htmlResultCount[0], $matches);
$europeFormatCount = intval(str_replace(",", "", $matches[1]));
...

Please try this code.
define("MAX_RESULT_ALL_PAGES", 1200);
// new dom object
$dom = new DOMDocument();
// HTML string
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
$html_string = file_get_contents($queryUrl);
//load the html
$html = $dom->loadHTML($html_string);
//discard white space
$dom->preserveWhiteSpace = TRUE;
//Get all h2 tags
$nodes = $dom->getElementsByTagName('h2');
// Store total result count
$totalCount = 0;
// loop over the all h2 tags and print result
foreach ($nodes as $node) {
if ($node->hasAttributes()) {
foreach ($node->attributes as $attribute) {
if ($attribute->name === 'class' && $attribute->value == 'resultCount') {
$inner_html = str_replace(',', '', trim($node->nodeValue));
$inner_html_array = explode(' ', $inner_html);
// Print result to the terminal
$totalCount += $inner_html_array[5];
}
}
}
}
// If result count grater than 1200, do this
if ($totalCount > MAX_RESULT_ALL_PAGES) {
$queryUrl = AMAZON_SEARCH_URL.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
}

Give this a try:
$match =array();
preg_match('/(?<=of\s)(?:\d{1,3}+(?:,\d{3})*)(?=\sResults)/', $htmlResultCount, $match);
$europeFormatCount = str_replace(',','',$match[0]);
The RegEx reads the number between "of " and " Results", it matches numbers with ',' seperator.

adding a char to all array items ap art from last using for/foreach

I have an array, which I am using the following code:
foreach ($taglist as $tag=>$size){
echo link_to(
$tag,
"#search-tag?tag=" . strtolower($tag),
array(
"class" => 'tag' . $size,
"title" => "View all articles tagged '" . $tag . "'"
)
);
}
Now, this simply prints a hyperlink
What I'm looking to do, is to add the pipe char ( | ) after every link, apart from the last one.
Could I do this in a loop?
Thanks

$k = 0;
foreach($taglist as $tag=>$size)
{
$k++;
echo link_to($tage, ...);
if ($k != sizeof($taglist)) echo '|';
}

You can use a plain old boolean variable:
$first = true;
foreach($taglist as $tag=>$size){
if ($first) $first = false; else echo '|';
echo link_to($tage, ...);
}
Note that technically, this code outputs a bar before every element except the first, which has the exact same effect as outputting a bar after every element except the last.

Use a temporary array then join elements /
$links = array();
foreach($taglist as $tag=>$size){
$links[] = link_to($tag, ...);
}
echo implode('|', $links);

You can use a CachingIterator
$links = new CachingIterator(new ArrayIterator($tagList));
foreach($links as $tag => $size) {
echo link_to(/* bla */), $links->hasNext() ? '|' : '';
}
For more info on the CachingIterator see my answer at Peek ahead when iterating an array in PHP

php associative arrays, regex, array

I currently have the following code :
$content = "
<name>Manufacturer</name><value>John Deere</value><name>Year</name><value>2001</value><name>Location</name><value>NSW</value><name>Hours</name><value>6320</value>";
I need to find a method to create and array as name=>value. E.g Manufacturer => John Deere.
Can anyone help me with a simple code snipped I tried some regex but doesn't even work to extract the names or values, e.g.:
$pattern = "/<name>Manufacturer<\/name><value>(.*)<\/value>/";
preg_match_all($pattern, $content, $matches);
$st_selval = $matches[1][0];

You don't want to use regex for this. Try out something like SimpleXML
EDIT
Well, why don't you start with this:
<?php
$content = "<root>" . $content . "</root>";
$xml = new SimpleXMLElement($c);
print_r($xml);
?>
EDIT 2
Despite the fact that some of the answers posted using regular expression MAY work, you should get in the habit of using the correct tool for the job and regular expressions are not the correct tool for parsing of XML.

I'm using your $content variable:
$preg1 = preg_match_all('#<name>([^<]+)#', $content, $name_arr);
$preg2 = preg_match_all('#<value>([^<]+)#', $content, $val_arr);
$array = array_combine($name_arr[1], $val_arr[1]);

This is rather simple, can be solved by regex. Should be:
$name = '<name>\s*([^<]+)</name>\s*';
$value = '<value>\s*([^<]+)</value>\s*';
$pattern = "|$name $value|";
preg_match_all($pattern, $content, $matches);
# create hash
$stuff = array_combine($matches[1], $matches[2]);
# display
var_dump($stuff);
Regards
rbo

First of all, never use regex to parse xml...
You could do this with an XPATH query...
First, wrap the content in a root tag to make the parser happy (if it doesn't already have it):
$content = '<root>' . $content . '</root>';
Then, load the document
$dom = new DomDocument();
$dom->loadXml($content);
Then, initialize the XPATH
$xpath = new DomXpath($dom);
Write your query:
$xpathQuery = '//name[text()="Manufacturer"]/follwing-sibling::value/text()';
Then, execute it:
$manufacturer = $xpath->evaluate($xpathQuery);
If I did the xpath right, it $manufacturer should be John Deere...
You can see the docs on DomXpath, a basic primer on XPath, and a bunch of XPath examples...
Edit: That won't work (PHP doesn't support that syntax (following-sibling). You could do this instead of the xpath query:
$xpathQuery = '//name[text()="Manufacturer"]';
$elements = $xpath->query($xpathQuery);
$manufacturer = $elements->item(0)->nextSibling->nodeValue;

I think this is what you're looking for:
<?php
$content = "<name>Manufacturer</name><value>John Deere</value><name>Year</name><value>2001</value><name>Location</name><value>NSW</value><name>Hours</name><value>6320</value>";
$pattern = "(\<name\>(\w*)\<\/name\>\<value\>(\w*)\<\/value\>)";
preg_match_all($pattern, $content, $matches);
$arr = array();
for ($i=0; $i<count($matches); $i++){
$arr[$matches[1][$i]] = $matches[2][$i];
}
/* This is an example on how to use it */
echo "Location: " . $arr["Location"] . "<br><br>";
/* This is the array */
print_r($arr);
?>
If your array has a lot of elements dont use the count() function in the for loop, calculate the value first and then use it as a constant.

I'll edit as my PHP is wrong, but here's some PHP (pseudo-)code to give some direction.
$pattern = '|<name>([^<]*)</name>\s*<value>([^<]*)</value>|'
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
for($i = 0; $i < count($matches); $i++) {
$arr[$matches[$i][1]] = $matches[$i][2];
}
$arr is the array you want to store the name/value pairs.

Using XMLReader:
$content = '<name>Manufacturer</name><value>John Deere</value><name>Year</name><value>2001</value><name>Location</name><value>NSW</value><name>Hours</name><value>6320</value>';
$content = '<content>' . $content . '</content>';
$output = array();
$reader = new XMLReader();
$reader->XML($content);
$currentKey = null;
$currentValue = null;
while ($reader->read()) {
switch ($reader->name) {
case 'name':
$reader->read();
$currentKey = $reader->value;
$reader->read();
break;
case 'value':
$reader->read();
$currentValue = $reader->value;
$reader->read();
break;
}
if (isset($currentKey) && isset($currentValue)) {
$output[$currentKey] = $currentValue;
$currentKey = null;
$currentValue = null;
}
}
print_r($output);
The output is:
Array
(
[Manufacturer] => John Deere
[Year] => 2001
[Location] => NSW
[Hours] => 6320
)

Need some help with XML parsing

The XML feed is located at: http://xml.betclick.com/odds_fr.xml
I need a php loop to echo the name of the match, the hour, and the bets options and the odds links.
The function will select and display ONLY the matchs of the day with streaming="1" and the bets type "Ftb_Mr3".
I'm new to xpath and simplexml.
Thanks in advance.
So far I have:
<?php
$xml_str = file_get_contents("http://xml.betclick.com/odds_fr.xml");
$xml = simplexml_load_string($xml_str);
// need xpath magic
$xml->xpath();
// display
?>

Xpath is pretty simple once you get the hang of it
you basically want to get every match tag with a certain attribute
//match[#streaming=1]
will work pefectly, it gets every match tag from underneath the parent tag with the attribute streaming equal to 1
And i just realised you also want matches with a bets type of "Ftb_Mr3"
//match[#streaming=1]/bets/bet[#code="Ftb_Mr3"]
This will return the bet node though, we want the match, which we know is the grandparent
//match[#streaming=1]/bets/bet[#code="Ftb_Mr3"]/../..
the two dots work like they do in file paths, and gets the match.
now to work this into your sample just change the final bit to
// need xpath magic
$nodes = $xml->xpath('//match[#streaming=1]/bets/bet[#code="Ftb_Mr3"]/../..');
foreach($nodes as $node) {
echo $node['name'].'<br/>';
}
to print all the match names.

I don't know how to work xpath really, but if you want to 'loop it', this should get you started:
<?php
$xml = simplexml_load_file("odds_fr.xml");
foreach ($xml->children() as $child)
{
foreach ($child->children() as $child2)
{
foreach ($child2->children() as $child3)
{
foreach($child3->attributes() as $a => $b)
{
echo $a,'="',$b,"\"</br>";
}
}
}
}
?>
That gets you to the 'match' tag which has the 'streaming' attribute. I don't really know what 'matches of the day' are, either, but...
It's basically right out of the w3c reference:
http://www.w3schools.com/PHP/php_ref_simplexml.asp

I am using this on a project. Scraping Beclic odds with:
<?php
$match_csv = fopen('matches.csv', 'w');
$bet_csv = fopen('bets.csv', 'w');
$xml = simplexml_load_file('http://xml.cdn.betclic.com/odds_en.xml');
$bookmaker = 'Betclick';
foreach ($xml as $sport) {
$sport_name = $sport->attributes()->name;
foreach ($sport as $event) {
$event_name = $event->attributes()->name;
foreach ($event as $match) {
$match_name = $match->attributes()->name;
$match_id = $match->attributes()->id;
$match_start_date_str = str_replace('T', ' ', $match->attributes()->start_date);
$match_start_date = strtotime($match_start_date_str);
if (!empty($match->attributes()->live_id)) {
$match_is_live = 1;
} else {
$match_is_live = 0;
}
if ($match->attributes()->streaming == 1) {
$match_is_running = 1;
} else {
$match_is_running = 0;
}
$match_row = $match_id . ',' . $bookmaker . ',' . $sport_name . ',' . $event_name . ',' . $match_name . ',' . $match_start_date . ',' . $match_is_live . ',' . $match_is_running;
fputcsv($match_csv, explode(',', $match_row));
foreach ($match as $bets) {
foreach ($bets as $bet) {
$bet_name = $bet->attributes()->name;
foreach ($bet as $choice) {
// team numbers are surrounded by %, we strip them
$choice_name = str_replace('%', '', $choice->attributes()->name);
// get the float value of odss
$odd = (float)$choice->attributes()->odd;
// concat the row to be put to csv file
$bet_row = $match_id . ',' . $bet_name . ',' . $choice_name . ',' . $odd;
fputcsv($bet_csv, explode(',', $bet_row));
}
}
}
}
}
}
fclose($match_csv);
fclose($bet_csv);
?>
Then loading the csv files into mysql. Running it once a minute, works great so far.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Isolate a single item from XML feed with PHP - php

Hi and thanks for looking. I'm trying to get just the value in brackets after 1 GBP = USD from the following link, how do I do this with SimpleXML? http://www.sloomedia.com/currency/feeds/GBP.xml Thanks, Ben.

Related

Php variable into a XML request string

find a element in html and explode it for stock

adding a char to all array items ap art from last using for/foreach

php associative arrays, regex, array

Need some help with XML parsing

Categories

Resources