Need some help with XML parsing

Need some help with XML parsing - php

The XML feed is located at: http://xml.betclick.com/odds_fr.xml
I need a php loop to echo the name of the match, the hour, and the bets options and the odds links.
The function will select and display ONLY the matchs of the day with streaming="1" and the bets type "Ftb_Mr3".
I'm new to xpath and simplexml.
Thanks in advance.
So far I have:
<?php
$xml_str = file_get_contents("http://xml.betclick.com/odds_fr.xml");
$xml = simplexml_load_string($xml_str);
// need xpath magic
$xml->xpath();
// display
?>

Xpath is pretty simple once you get the hang of it
you basically want to get every match tag with a certain attribute
//match[#streaming=1]
will work pefectly, it gets every match tag from underneath the parent tag with the attribute streaming equal to 1
And i just realised you also want matches with a bets type of "Ftb_Mr3"
//match[#streaming=1]/bets/bet[#code="Ftb_Mr3"]
This will return the bet node though, we want the match, which we know is the grandparent
//match[#streaming=1]/bets/bet[#code="Ftb_Mr3"]/../..
the two dots work like they do in file paths, and gets the match.
now to work this into your sample just change the final bit to
// need xpath magic
$nodes = $xml->xpath('//match[#streaming=1]/bets/bet[#code="Ftb_Mr3"]/../..');
foreach($nodes as $node) {
echo $node['name'].'<br/>';
}
to print all the match names.

I don't know how to work xpath really, but if you want to 'loop it', this should get you started:
<?php
$xml = simplexml_load_file("odds_fr.xml");
foreach ($xml->children() as $child)
{
foreach ($child->children() as $child2)
{
foreach ($child2->children() as $child3)
{
foreach($child3->attributes() as $a => $b)
{
echo $a,'="',$b,"\"</br>";
}
}
}
}
?>
That gets you to the 'match' tag which has the 'streaming' attribute. I don't really know what 'matches of the day' are, either, but...
It's basically right out of the w3c reference:
http://www.w3schools.com/PHP/php_ref_simplexml.asp

I am using this on a project. Scraping Beclic odds with:
<?php
$match_csv = fopen('matches.csv', 'w');
$bet_csv = fopen('bets.csv', 'w');
$xml = simplexml_load_file('http://xml.cdn.betclic.com/odds_en.xml');
$bookmaker = 'Betclick';
foreach ($xml as $sport) {
$sport_name = $sport->attributes()->name;
foreach ($sport as $event) {
$event_name = $event->attributes()->name;
foreach ($event as $match) {
$match_name = $match->attributes()->name;
$match_id = $match->attributes()->id;
$match_start_date_str = str_replace('T', ' ', $match->attributes()->start_date);
$match_start_date = strtotime($match_start_date_str);
if (!empty($match->attributes()->live_id)) {
$match_is_live = 1;
} else {
$match_is_live = 0;
}
if ($match->attributes()->streaming == 1) {
$match_is_running = 1;
} else {
$match_is_running = 0;
}
$match_row = $match_id . ',' . $bookmaker . ',' . $sport_name . ',' . $event_name . ',' . $match_name . ',' . $match_start_date . ',' . $match_is_live . ',' . $match_is_running;
fputcsv($match_csv, explode(',', $match_row));
foreach ($match as $bets) {
foreach ($bets as $bet) {
$bet_name = $bet->attributes()->name;
foreach ($bet as $choice) {
// team numbers are surrounded by %, we strip them
$choice_name = str_replace('%', '', $choice->attributes()->name);
// get the float value of odss
$odd = (float)$choice->attributes()->odd;
// concat the row to be put to csv file
$bet_row = $match_id . ',' . $bet_name . ',' . $choice_name . ',' . $odd;
fputcsv($bet_csv, explode(',', $bet_row));
}
}
}
}
}
}
fclose($match_csv);
fclose($bet_csv);
?>
Then loading the csv files into mysql. Running it once a minute, works great so far.

Related

Parse all items with PHP Simple HTML DOM Parser

I'm trying to parse DOM with Simple HTML DOM Parser in PHP. Parsed content are movies, so I want to get every genre there is, but when i run my code I only get the last genre, not all. My code looks like this:
if ($obj) {
foreach($obj as $key => $data) {
$item['url'] = 'http://geo.saitebi.ge/movie/' . $page;
$item['poster'] = 'http://geo.saitebi.ge/web/ka/img/movies/' . $page . '/240x340/' . $page . '.jpg';
$item['geotitle'] = $data->find('div.movie-item-title', 0)->plaintext;
$item['englishtitle'] = $data->find('div.movie-item-title-en', 0)->plaintext;
$item['year'] = $data->find('div.movie-item-year', 0)->plaintext;
foreach($data->find('a.movie-genre-item') as $genre) {
$item['genres'] = $genre->plaintext . ', ';
}
$item['description'] = $data->find('div.movie-desctiption-more', 0)->plaintext;
$item['imdb_rating'] = $data->find('a.imdb_vote', 0)->plaintext;
$item['imdb_id'] = trim(substr($data->find('a.imdb_vote',0)->href, strrpos($data->find('a.imdb_vote',0)->href, '/') + 1));
}
}
As you see I'm getting content as array. Then inside it I run another foreach loop to get all genre items but it only gets last genre item. What is wrong in my code?

You are just overwriting the last set of data each time. You need to set it to blank and then append it using .= each time, like...
$item['genres'] = '';
foreach($data->find('a.movie-genre-item') as $genre) {
$item['genres'] .= $genre->plaintext . ', ';
}

The code are saving the same '$genre->plaintext' in $item with key 'genres'
so the same key replace the values every loop.
foreach($data->find('a.movie-genre-item') as $genre) {
$item['genres'] = $genre->plaintext . ', ';
}
May $item['genres] can be an associative array.. i mean:
foreach($data->find('a.movie-genre-item') as $genre) {
if ( !isset($item[$genre]) ) {
$item[$genre] = array();
}
array_push($item[$genre],$genre->plaintext);
}

Here is genre code which you have to update
// Here is you genre code
$movie_genre='';
foreach($data->find('a.movie-genre-item') as $genre) {
$movie_genre .= $genre->plaintext . ',';
}
// Here you can use rtrim for removing last comma from genre
$item['genres'] = rtrim($movie_genre,',');

find a element in html and explode it for stock

I want to retrieve an HTML element in a page.
<h2 id="resultCount" class="resultCount">
<span>
Showing 1 - 12 of 40,923 Results
</span>
</h2>
I have to get the total number of results for the test in my php.
For now, I get all that is between the h2 tags and I explode the first time with space.
Then I explode again with the comma to concatenate able to convert numbers results in European format. Once everything's done, I test my number results.
define("MAX_RESULT_ALL_PAGES", 1200);
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
$htmlResultCountPage = file_get_html($queryUrl);
$htmlResultCount = $htmlResultCountPage->find("h2[id=resultCount]");
$resultCountArray = explode(" ", $htmlResultCount[0]);
$explodeCount = explode(',', $resultCountArray[5]);
$europeFormatCount = '';
foreach ($explodeCount as $val) {
$europeFormatCount .= $val;
}
if ($europeFormatCount > MAX_RESULT_ALL_PAGES) {*/
$queryUrl = AMAZON_SEARCH_URL.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
}
At the moment the total number of results is not well recovered and the condition does not happen even when it should.
Someone would have a solution to this problem or any other way?

I would simply fetch the page as a string (not html) and use a regular expression to get the total number of results. The code would look something like this:
define('MAX_RESULT_ALL_PAGES', 1200);
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT . $searchMonthUrlParam . $searchYearUrlParam . $searchTypeUrlParam . urlencode($keyword) . '&page=' . $pageNum;
$queryResult = file_get_contents($queryUrl);
if (preg_match('/of\s+([0-9,]+)\s+Results/', $queryResult, $matches)) {
$totalResults = (int) str_replace(',', '', $matches[1]);
} else {
throw new \RuntimeException('Total number of results not found');
}
if ($totalResults > MAX_RESULT_ALL_PAGES) {
$queryUrl = AMAZON_SEARCH_URL . $searchMonthUrlParam . $searchYearUrlParam . $searchTypeUrlParam . urlencode($keyword) . '&page=' . $pageNum;
// ...
}

A regex would do it:
...
preg_match("/of ([0-9,]+) Results/", $htmlResultCount[0], $matches);
$europeFormatCount = intval(str_replace(",", "", $matches[1]));
...

Please try this code.
define("MAX_RESULT_ALL_PAGES", 1200);
// new dom object
$dom = new DOMDocument();
// HTML string
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
$html_string = file_get_contents($queryUrl);
//load the html
$html = $dom->loadHTML($html_string);
//discard white space
$dom->preserveWhiteSpace = TRUE;
//Get all h2 tags
$nodes = $dom->getElementsByTagName('h2');
// Store total result count
$totalCount = 0;
// loop over the all h2 tags and print result
foreach ($nodes as $node) {
if ($node->hasAttributes()) {
foreach ($node->attributes as $attribute) {
if ($attribute->name === 'class' && $attribute->value == 'resultCount') {
$inner_html = str_replace(',', '', trim($node->nodeValue));
$inner_html_array = explode(' ', $inner_html);
// Print result to the terminal
$totalCount += $inner_html_array[5];
}
}
}
}
// If result count grater than 1200, do this
if ($totalCount > MAX_RESULT_ALL_PAGES) {
$queryUrl = AMAZON_SEARCH_URL.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
}

Give this a try:
$match =array();
preg_match('/(?<=of\s)(?:\d{1,3}+(?:,\d{3})*)(?=\sResults)/', $htmlResultCount, $match);
$europeFormatCount = str_replace(',','',$match[0]);
The RegEx reads the number between "of " and " Results", it matches numbers with ',' seperator.

How to introduce white space in following?

The following snippet of PHP code creates $desc alright, but I like it to introduce two (2) blank spaces between every dpItemFeatureList found as it goes through its iteration.
I can't seem to garner exactly what or where to add a snippet to do this?
function get_description($asin){
$url = 'http://www.amazon.com/gp/aw/d/' . $asin . '?d=f&pd=1';
$data = request_data($url);
$desc = '';
if ($data) {
$dom = new DOMDocument();
#$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
if (preg_match('#dpItemFeaturesList#',$data)){
$k = $xpath->query('//ul[#class="dpItemFeaturesList"]');
foreach ($k as $c => $tot) {
$desc .= $tot->nodeValue;
}
}
}
return $desc;

Looking at the code you have shared here and consequently having a look at the data that you are processing (a sample of which I have pasted here) you actually want to collect the text within the <li> child elements of the <ul class="dpItemFeaturesList"> node.
In your original code snippet your XPath is as follows:
'//ul[#class="dpItemFeaturesList"]'
This will only select the <ul> element and not the child elements. Consequently when you try to do a $tot->nodeValue it will concatenate all the text within all it's child nodes without spaces (ah ha, the real reason why you want spaces in the first place).
To fix this we should do two things:
Select the <li> nodes within the appropriate node. Change the XPath to //ul[#class="dpItemFeaturesList"]/li.
In the foreach loop concatenate 2 non-breakable spaces (because this is HTML) to the $desc variable.
Here $c is the array index.
function get_description($asin){
$url = 'http://www.amazon.com/gp/aw/d/' . $asin . '?d=f&pd=1';
$data = request_data($url);
$desc = '';
if ($data) {
$dom = new DOMDocument();
#$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
if (preg_match('#dpItemFeaturesList#',$data)){
$k = $xpath->query('//ul[#class="dpItemFeaturesList"]/li');
foreach ($k as $c => $tot) {
if ($c > 0) {
$desc .= " ";
}
$desc .= $tot->nodeValue;
}
}
}
return $desc;
}
We check for $c > 0 so that you will not get extra spaces after the last node in the loop.
P.S.: Unrelated to your original question. The code for which you shared a link has an undefined variable $timestamp in $date = date("format", $timestamp); on line 116.

Since you're appending everything to desc, try something like
$desc .= $tot->nodeValue;
$desc .= "<br />"

try that:
$desc .= $tot->nodeValue.' ';
and trim($desc) after the loop to avoid two spaces at the end.
or, alternatively create an array:
$desc = array();
//....
$desc[] = $tot->nodeValue;
and return implode(' ', $desc)

If you need that between each one, you need to add in front on each iteration but the first:
$k = $xpath->query('//ul[#class="dpItemFeaturesList"]');
foreach ($k as $c => $tot) {
$c && $desc .= ' '; # all but first
$desc .= $tot->nodeValue;
}
This is an expression which saves you an if but it works similar. Maybe a bit of taste so sure, an if can do it as well:
$k = $xpath->query('//ul[#class="dpItemFeaturesList"]');
foreach ($k as $c => $tot) {
if($c) $desc .= ' '; # all but first
$desc .= $tot->nodeValue;
}
This works because every integer number but zero is true in PHP.
See the demo.

DOMdocument search for tag

i am trying to do this:
i have several thousand xml files, i am reading them, and i am looking for special text inside an xml with specific tag, but those tags which are having the text i need, are different. what i did till now is this:
$xml_filename = "xml/".$anzeigen_id.".xml";
$dom = new DOMDocument();
$dom->load($xml_filename);
$value = $dom->getElementsByTagName('FormattedPositionDescription');
foreach($value as $v){
$text = $v->getElementsByTagName('Value');
foreach($text as $t){
$anzeige_txt = $t->nodeValue;
$anzeige_txt = utf8_decode($anzeige_txt);
$anzeige_txt = mysql_real_escape_string($anzeige_txt);
echo $anzeige_txt;
$sql = "INSERT INTO joinvision_anzeige(`firmen_id`,`anzeige_id`,`anzeige_txt`) VALUES ('$firma_id','$anzeigen_id','$anzeige_txt')";
$sql_inserted = mysql_query($sql);
if($sql_inserted){
echo "'$anzeigen_id' from $xml_filename inserted<br />";
}else{
echo mysql_errno() . ": " . mysql_error() . "\n";
}
}
}
now what i need to do is this:
look for FormattedPositionDescription in xml and if there is not this tag there, then look for anothertag in that same xml file..
how can i do this, thanks for help in advance

Just check the length property of the DOMNodeList:
$value = $dom->getElementsByTagName('FormattedPositionDescription');
if($value->length > 0)
{
// found some FormattedPositionDescription
}
else
{
// didn't find any FormattedPositionDescription, so look for anothertag
$list = $dom->getElementsByTagName('anothertag');
}

adding a char to all array items ap art from last using for/foreach

I have an array, which I am using the following code:
foreach ($taglist as $tag=>$size){
echo link_to(
$tag,
"#search-tag?tag=" . strtolower($tag),
array(
"class" => 'tag' . $size,
"title" => "View all articles tagged '" . $tag . "'"
)
);
}
Now, this simply prints a hyperlink
What I'm looking to do, is to add the pipe char ( | ) after every link, apart from the last one.
Could I do this in a loop?
Thanks

$k = 0;
foreach($taglist as $tag=>$size)
{
$k++;
echo link_to($tage, ...);
if ($k != sizeof($taglist)) echo '|';
}

You can use a plain old boolean variable:
$first = true;
foreach($taglist as $tag=>$size){
if ($first) $first = false; else echo '|';
echo link_to($tage, ...);
}
Note that technically, this code outputs a bar before every element except the first, which has the exact same effect as outputting a bar after every element except the last.

Use a temporary array then join elements /
$links = array();
foreach($taglist as $tag=>$size){
$links[] = link_to($tag, ...);
}
echo implode('|', $links);

You can use a CachingIterator
$links = new CachingIterator(new ArrayIterator($tagList));
foreach($links as $tag => $size) {
echo link_to(/* bla */), $links->hasNext() ? '|' : '';
}
For more info on the CachingIterator see my answer at Peek ahead when iterating an array in PHP

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Need some help with XML parsing - php

Related

Parse all items with PHP Simple HTML DOM Parser

find a element in html and explode it for stock

How to introduce white space in following?

DOMdocument search for tag

adding a char to all array items ap art from last using for/foreach

Categories

Resources