how to get only some href attribute - php

I have this PHP code which I'm trying to extract some information but I stopped to href step:
$site = "http://www.sports-reference.com/olympics/countries";
$site_html = file_get_html($site);
$country_dirty = $site_html->getElementById('div_countries');
foreach($country_dirty->find('img') as $link){
$country = $link->alt;
$link_country = "$site/$country";
$link_country_html = file_get_html($link_country);
$link_season = $link_country_html->getElementById('div_medals');
foreach($link_season->find('a') as $season){
echo $link_year_season = $season->href . "\n";
//echo $link_season = strstr ($link_year_season,'summer') . "\n";
}
}
The variable $link_year_season gets me the following output:
/olympics/countries/AFG/summer/2012/
/olympics/athletes/ba/nesar-ahmad-bahawi-1.html
/olympics/athletes/ni/rohullah-nikpai-1.html
/olympics/countries/AFG/summer/2008/
/olympics/athletes/ba/nesar-ahmad-bahawi-1.html
/olympics/athletes/ni/rohullah-nikpai-1.html
/olympics/countries/AFG/summer/2004/
/olympics/countries/AFG/summer/1996/
/olympics/countries/AFG/summer/1988/
/olympics/countries/AFG/summer/1980/
/olympics/countries/AFG/summer/1972/
.....
I'd like to know if it is possible to get only this output:
/olympics/countries/AFG/summer/2012/
/olympics/countries/AFG/summer/2008/
/olympics/countries/AFG/summer/2004/
/olympics/countries/AFG/summer/1996/
/olympics/countries/AFG/summer/1988/
/olympics/countries/AFG/summer/1980/
/olympics/countries/AFG/summer/1972/

You should be able to use this regex to check that the link starts with /olympics/countries/AFG/summer/ then a number and a /.
foreach($link_season->find('a') as $season){
if(preg_match('~^/olympics/countries/AFG/summer/\d+/~', $season->href)) {
echo $link_year_season = $season->href . "\n";
//echo $link_season = strstr ($link_year_season,'summer') . "\n";
}
}
Demo: https://regex101.com/r/bZ1vP3/1
You could also pull the current year by capturing the number after summer (presuming that is a year, first regex just checks for number this one is stricter)..
foreach($link_season->find('a') as $season){
if(preg_match('~^/olympics/countries/AFG/summer/(\d{4})/~', $season->href, $year)) {
echo $link_year_season = $season->href . "\n";
//echo $link_season = strstr ($link_year_season,'summer') . "\n";
echo 'The year is ' . $year[1] . "\n";
}
}
If the season also can vary you could do (?:summer|winter) which would allow for summer or winter to be the fourth directory.

Related

Nested foreach when reading XML (in PHP)

I need to create a CRON job that are weekly going to read a XML file. The XML file contains information about all the shows at a range of cinemas.
What I want to do is to read the XML file, extract the information I need about each show, and then upload each show to a database. But I run into trouble when I start nesting the for-loops.
I want each tuple to contain the following information:
Tile | FilmWebNr | Rating | Version | Center | Screen | Date | Time |
The URL for the XML is http://217.144.251.113/static/Shows_FilmWeb.php
Here is a pastebin where I try to list all the dates for each screen per Title.
Here is the result. As you can see, the dates is only displayed when there are more than 1 screen per Title. I dont get why the attributes array isn't always available.
I struggle with getting the last three (screen, date and time).
$map_url = "http://217.144.251.113/static/Shows_FilmWeb.php";
$response_xml_data = file_get_contents($map_url);
$data = simplexml_load_string($response_xml_data);
$array = (array) simplexml_load_string($response_xml_data);
$json = json_encode($array);
$configData = json_decode($json, true);
$movies = $configData['Performances']['Title'];
foreach ($movies as $title) {
echo "Title: " . $title['#attributes']['Name'] . '<br/>';
echo "FilmWebNr: " . $title['FilmWebNum'] . '<br/>';
echo "Rating: " . $title['TitleRating'] . '<br/>';
echo "Version: " . $title['TitleVersion'] . '<br/>';
echo "Center: " . $title['Center']['#attributes']['Name'] . '<br/>';
foreach ($title['Center']['Screen'] as $screen) {
//here I run into trouble
}
}
Let say I try to add the following in the inner loop:
$screen['#attributes']['Name'];
I get an error saying "Undefined index: #attributes".
So sometimes the attributes seems to be in an array, but sometimes not. Even though It is always a part of the XML.
Rather than going from XML-JSON-Arrays, it may be better to learn how to work with SimpleXML and you will find it's quite easy.
The main thing is to get used to how the various elements are layered and use foreach loops to iterate over the blocks...
$map_url = "http://217.144.251.113/static/Shows_FilmWeb.php";
$response_xml_data = file_get_contents($map_url);
$data = simplexml_load_string($response_xml_data);
$movies = $data->Performances->Title;
foreach ($movies as $title) {
echo "Title: " . $title['Name'] . '<br/>';
echo "FilmWebNr: " . $title->FilmWebNum . '<br/>';
echo "Rating: " . $title->TitleRating . '<br/>';
echo "Version: " . $title->TitleVersion . '<br/>';
echo "Center: " . $title->Center['Name'] . '<br/>';
foreach ($title->Center->Screen as $screen) {
echo "screen:".$screen['Name']. '<br/>';
foreach ( $screen->Date as $date ) {
echo "Date:".$date['Name']. '<br/>';
foreach ( $date->ShowID as $showID ) {
echo "Time:".$showID->Time. '<br/>';
}
}
}
}

php echo statement inside $output

I have the following php code:
$skizzar_masonry_item_width = $masonry_item_width;
$skizzar_masonry_item_padding = $masonry_item_padding;
$skizzar_double_width_size = $masonry_item_width*2 +$masonry_item_padding;
$output .= '<style>.skizzar_masonry_entry.skizzar_ma_double, .skizzar_masonry_entry.skizzar_ma_double img {width:'.$skizzar_double_width_size.'}</style>';
return $output;
For some reason though, the value of $skizzar_double_width_size is not being added into the $output - is there a way to echo a value in an output variable?
As #Rizier123 mentioned, ensure you initialise any string variables before trying to append to them.
$var = '';
$var .= 'I appended';
$var .= ' a string!';
I would also like to strongly discourage you from using inline styles as well as generating them with inline PHP. Things get very messy very quickly.
In a situation like this you need to check that all the variables you are using in the calculation are valid before you panic.
So try
echo 'before I use these values they contain<br>';
echo '$masonry_item_width = ' . $masonry_item_width . '<br>';
echo '$masonry_item_padding = ' . $masonry_item_padding . '<br>';
$skizzar_masonry_item_width = $masonry_item_width;
$skizzar_masonry_item_padding = $masonry_item_padding;
$skizzar_double_width_size = $masonry_item_width*2 +$masonry_item_padding;
echo 'after moving the fields to an unnecessary intemediary field<br>';
echo '$skizzar_masonry_item_width = ' . $skizzar_masonry_item_width . '<br>';
echo '$skizzar_masonry_item_padding = ' . $skizzar_masonry_item_padding . '<br>';
echo '$skizzar_double_width_size = ' . $skizzar_double_width_size . '<br>';
$output .= '<style>.skizzar_masonry_entry.skizzar_ma_double, .skizzar_masonry_entry.skizzar_ma_double img {width:'.$skizzar_double_width_size.'}</style>';
echo $output;
This should identify which fields are causing you problems.
Also while testing always run with display_errors = On It saves so much time in the long run.

Getting XML with PHP: how to get attributes from 2 nodes with same name

I am getting attributes from XML nodes and saving them to variables with a for loop as such:
for ($i = 0; $i < 10; $i++){
$group = $xml->Competition->Round[0]->Event[$i][Group];
if($group == "MTCH"){
$eventid = $xml->Competition->Round[0]->Event[$i][EventID];
$eventname = $xml->Competition->Round[0]->Event[$i][EventName];
$teamaname = $xml->Competition->Round[0]->Event[$i]->EventSelections[0][EventSelectionName];
$teambname = $xml->Competition->Round[0]->Event[$i]->EventSelections[1][EventSelectionName];
echo "<br/>" . $eventid . ": " . $eventname . ", " . $teamaname . "VS" . $teambname;
}//IF
}//FOR
I can save each Event[EventID] and each Event[EventName] but I cannot get the EventSelections[EventSelectionNames] to save.
I am guessing this is because there are multiple (2) <EventSelection>s for each <Event>, this is why I tried to get them individually uising [0] and [1].
The part of the XML file in question looks like:
<Event EventID="1008782" EventName="Collingwood v Fremantle" Venue="" EventDate="2014-03-14T18:20:00" Group="MTCH">
<Market Type="Head to Head" EachWayPlaces="0">
<EventSelections BetSelectionID="88029974" EventSelectionName="Collingwood">
<Bet Odds="2.10" Line=""/>
</EventSelections>
<EventSelections BetSelectionID="88029975" EventSelectionName="Fremantle">
<Bet Odds="1.70" Line=""/>
</EventSelections>
</Market>
</Event>
Can anyone point me in the right direction to save the EventSelectionNames to variables?
Rather than looping and checking for $group, use xpath to select data directly:
$xml = simplexml_load_string($x); // assume XML in $x
$group = $xml->xpath("/Event[#Group = 'MTCH']")[0];
echo "ID: $group[EventID], name: $group[EventName]" . PHP_EOL;
If there are always two <EventSelections>, you can:
echo "Team A: " . $group->Market->EventSelections[0]['EventSelectionName']" . PHP_EOL;
echo "Team B: " . $group->Market->EventSelections[1]['EventSelectionName']" . PHP_EOL;
Otherwise, use foreach:
foreach ($group->Market->EventSelections as $es)
$teamnames[] = $es['EventSelectionName'];
echo "There are " . count($teamnames) . "Teams:" . PHP_EOL;
foreach ($teamname as $teamname) echo $teamname . PHP_EOL;
see it in action: https://eval.in/105642
Note:
The [0] at the end of the code-line starting with $group = $xml->xpath...requires PHP >= 5.4. If you are on a lower version, update PHP or use:
$group = $xml->xpath("/Event[#Group = 'MTCH']");
$group = $group[0];
Michi's answer is more correct and better coded but I also found the adding the node 'Market' to my code worked as well:
$teamaname = $xml->Competition->Round[0]->Event[$i]->Market->EventSelections[0][EventSelectionName];
$teambname = $xml->Competition->Round[0]->Event[$i]->Market->EventSelections[1][EventSelectionName];

Setting variable variables inside a foreach

I am trying to use $value inside the $feed_title variable. And generate all 200 $feed_title variables.
What I am trying to accomplish would look like this:
Feed Url: http://something.com/term/###/feed
Feed Title: Some Title
Where the ### varies from 100-300.
I am using the following code, and getting the urls, but not sure how to get the titles for each feed:
$arr = range(100,300);
foreach($arr as $key=>$value)
{
unset($arr[$key + 1]);
$feed_title = simplexml_load_file('http://www.something.com/term/'
. ??? . '/0/feed');
echo 'Feed URL: <a href="http://www.something.com/term/' . $value
. '/0/feed">http://www.something.com//term/' . $value
. '/0/feed</a><br/> Feed Category: ' . $feed_title->channel[0]->title
. '<br/>';
}
Do I need another loop inside of the foreach? Any help is appreciated.
If you want to get the title of a page, use this function:
function getTitle($Url){
$str = file_get_contents($Url);
if(strlen($str)>0){
preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);
return $title[1];
}
}
Here's some sample code:
<?php
function getTitle($Url){
$str = file_get_contents($Url);
if(strlen($str)>0){
preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);
return $title[1];
}
}
$arr = range(300,305);
foreach($arr as $value)
{
$feed_title = getTitle('http://www.translate.com/portuguese/feed/' . $value);
echo 'Feed URL: http://www.translate.com/portuguese/feed/' . $value . '<br/>
Feed Category: ' . $feed_title . '<br/>';
}
?>
This gets the title from translate.com pages. I just limited the number of pages for faster execution.
Just change the getTitle to your function if you want to get the title from xml.
Instead of using an array created with range, use a for loop as follows:
for($i = 100; $i <= 300; $i++){
$feed = simplexml_load_file('http://www.something.com/term/' . $i . '/0/feed');
echo 'Feed URL: http://www.something.com/term/' . $i . '/0/feed/ <br /> Feed category: ' . $feed->channel[0]->title . '<br/>';
}

How to fetch youtube id from the youtube url? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
RegEx pattern to get the YouTube video ID from any YouTube URL
I have stored youtube url in the database.I want to fetch only youtube id from the youtube url.I just want to extract id(6FjfewWAGdE) from below url.
$youtubeVal=http://www.youtube.com/embed/6FjfewWAGdE?feature=player_detailpage
you can do this by
var regExp = /^.*(youtu.be\/|v\/|u\/\w\/|embed)([^#\&\?]*).*/;
var match = url.match(regExp);
if (match&&match[2].length==11){
return match[2];
}
Youtube URLs come in a variety of formats (with the embed/<id> thing or with the watch?v=<id>). Find the URL type(s) you want to understand and build regular expressions to extract them correctly.
Here is a working non-regexp solution in PHP:
function GetYoutubeID($url) {
$temp = parse_url($url);
if(isset($temp['query'])) {
parse_str($temp['query'], $temp2);
if(isset($temp2['v'])) {
return $temp2['v'];
}
}
if(isset($temp['path'])) {
$temp2 = explode("/", $temp['path']);
foreach($temp2 as $value) {
if(strlen($value) == 11 || strlen($value) == 10) {
return $value;
}
}
}
return "no ID?";
}
echo $youtubeID = GetYoutubeID('http://www.youtube.com/embed/6FjfewWAGdE?feature=player_detailpage') . "\n";
echo $youtubeID = GetYoutubeID('https://www.youtube.com/embed/6FjfewWAGdE') . "\n";
echo $youtubeID = GetYoutubeID('https://youtube.com/embed/6FjfewWAGdE') . "\n";
echo $youtubeID = GetYoutubeID('http://www.youtube.com/watch?v=6FjfewWAGdE') . "\n";
echo $youtubeID = GetYoutubeID('http://www.youtube.com/watch?v=6FjfewWAGdE&feature=player_detailpage') . "\n";
echo $youtubeID = GetYoutubeID('www.youtu.be/6FjfewWAGdE') . "\n";
echo $youtubeID = GetYoutubeID('youtu.be/6FjfewWAGdE') . "\n";
echo $youtubeID = GetYoutubeID('youtube.com/watch?v=6FjfewWAGdE') . "\n";
echo $youtubeID = GetYoutubeID('https://www.youtube.com/watch?v=6FjfewWAGdE&feature=youtu.be') . "\n";
Replace the whole content of the function with the following to use a more better looking solution (Regular expression):
preg_match('/^(http:\/\/|https:\/\/|.*?)(www.|.*?)(youtube.com|youtu.be)\/(embed\/|watch\?v=|.*?)(.*?)(\?|\&|$)/is', $url, $matches);
if(isset($matches[5])) {
return $matches[5];
}
return "no ID?"
The regular expression solution is also applicable on Javascript:
function GetYoutubeID(url) {
var regExp = /^(http:\/\/|https:\/\/|.*?)(www.|.*?)(youtube.com|youtu.be)\/(embed\/|watch\?v=|.*?)(.*?)(\?|\&|$)/;
var match = url.match(regExp);
if (match){
return match[5];
}
return "no ID?";
}
console.log(GetYoutubeID('http://www.youtube.com/embed/6FjfewWAGdE?feature=player_detailpage'));
console.log(GetYoutubeID('http://www.youtube.com/embed/6FjfewWAGdE?feature=player_detailpage'));
console.log(GetYoutubeID('https://www.youtube.com/embed/6FjfewWAGdE'));
console.log(GetYoutubeID('https://youtube.com/embed/6FjfewWAGdE'));
console.log(GetYoutubeID('http://www.youtube.com/watch?v=6FjfewWAGdE'));
console.log(GetYoutubeID('http://www.youtube.com/watch?v=6FjfewWAGdE&feature=player_detailpage'));
console.log(GetYoutubeID('www.youtu.be/6FjfewWAGdE'));
console.log(GetYoutubeID('youtu.be/6FjfewWAGdE'));
console.log(GetYoutubeID('youtube.com/watch?v=6FjfewWAGdE'));
console.log(GetYoutubeID('https://www.youtube.com/watch?v=6FjfewWAGdE&feature=youtu.be'));
You can always try with
youtubeVal.substring(29, 39);

Categories