Create array of specific substrings - php

We use a custom CMS, build with PHP MySQL
I have a customer who embeds youtube videos in the content of the site. That is one string, that he can edit with CKeditor. That all works just fine.
He now wants to have those videos displayed on a different location within the same page.
I do not want to create a separate input field in the system just for this, for multiple reasons.
The solution I need is this:
I want to extract the (multiple) < iframe >youtube blah blah< /iframe > from the content string and create an array of iframe strings. Then I can display them elsewhere on the page.
For not displaying videos in the original content location I can use preg_replace to strip the iframes out of the content string.
I however have no idea how to fetch those substrings and form that new array in PHP.
Hope you have an idea and that my explanation is clear.
EDIT after getting the answer from Michel
The complete code I am using now:
$string = '<iframe>youtube iframe</iframe>Some cool text in between blahblah<iframe>moreyoutube</iframe>';
//catch the iframes
$iframe=array();
$parts=explode('<iframe',$string);
if (count($parts) > 1){ //make sure a string without iframes does not end up in the array
foreach($parts as $p){
if( strpos($p,'youtube') !== false ){
$v=explode('</iframe>',$p);
$iframe[]= '<iframe'.$v[0].'</iframe>';
}
}
}
//strip out iframes
$string = preg_replace('/<iframe(.*?)<\/iframe>/', '', $string);
This will give you a string without iframes, and an array of iframes to display seperately.
Thanks to Michel for the answer.

One way of doing it:
explode the content string on <iframe>.
Loop the resulting array and look with strpos for the word youtube (to rule out other iframes on the page).
If you find any, add <iframe> and </iframe> to the result
$string='<div>blabla</div><iframe src="youtube.org.com.uk.sk"></iframe><div>blahblah</div>';
$iframe=array();
$parts=explode('<iframe',$string);
foreach($parts as $p){
if( strpos($p,'youtube') !== false ){
$v=explode('</iframe>',$p);
$iframe[]= '<iframe'.$v[0].'</iframe>';
}
}

Related

Removing portion from scraped array

Currently I am scraping a website and I am trying to remove a portion of the code which I don't want to be included in the array.
so the code I have currently
$content['article'] = $html2->find('.hentry-content',0);
$content['article'] = $content['article']->plaintext;
This returns everything within the .hentry-content class on the website I am gathering content from.
Now the content that gets returned looks like this.
array (
[article] => This is some example filler content please no actual meaning behind random bridge for bridge random you dog tomorrow http://example.com/our-random-mp3.com
)
Now at the end of this output it usually includes a random MP3 is there anyway that I can pull just the content portion of the array without the mp3 being included?
if link is inside of <a> tag this should work
foreach($content['article']->find('a') as $item) {
$item->outertext = '';
}
echo $content['article']->plaintext;
If the returned text only contains one link to the random mp3-file you could filter it out with:
$url_pattern = '/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/';
$content['article'] = preg_replace($url_pattern, '', $content['article']->plaintext);
This will remove all urls from the text. I took the url-pattern from http://code.tutsplus.com/tutorials/8-regular-expressions-you-should-know--net-6149.

PHP - substr() only if certain character exists?

I'm using the YouTube Data API v3 to grab video titles and IDs to embed videos on a website. I'm just currently having a problem displaying the title in the way that I want it. Some of the video titles have text in brackets at the end, which I don't want to display on the website. I am currently using:
$videoTitle = substr($videoTitle, 0, strpos($videoTitle, '('));
The problem is that the titles that don't include brackets aren't being displayed. I'm not that experienced with PHP so I'm not sure of a way around this.
Any help will be appreciated. Thanks,
Oli.
First check whether or not the string contains the character, then modify it if it does. Otherwise leave it alone.
You can use strpos to check for the existence of the character, since it returns false if it does not exist in the string.
$videoTitle = strpos($videoTitle, '(') === false ? $videoTitle : substr($videoTitle, 0, strpos($videoTitle, '('));
or
if (strpos($videoTitle, '(') !== false)
$videoTitle : substr($videoTitle, 0, strpos($videoTitle, '('))
If you just split the string at the (, you can only use the first part as video title, like:
$splitString = explode('(', $videoTitle);
$videoTitle = $splitString[0];
But video titles can look different all the time and you can't really rely on a safe method to remove them.
You may use like below.If you will share a link that will be more helpfull.
<?php
$urlarray = explode("(",$videoTitle);
$videoTitle = $urlarray[0];
?>

Replace Specifc Full Links Between href=" " Using PHP

I have tried searching through related answers but can't quite find something that is suitable for my specific needs. I have quite a few affiliate links within 1,000s of articles on one of my wordpress sites - which all start with the same url format and sub-domain structure:
http://affiliateprogram.affiliates.com/
However, after the initial url format, the query string appended changes for each individual url in order to send visitors to specific pages on the destination site.
I am looking for something that will scan a string of html code (the article body) for all href links that include the specific domain above and then replace THE WHOLE LINK (whatever the query string appended) with another standard link of my choice.
href="http://affiliateprogram.affiliates.com/?random=query_string&page=destination"
gets replaced with
href="http://www.mylink.com"
I would ideally like to do this via php as I have a basic grasp, but if you have any other suggestions I would appreciate all input.
Thanks in advance.
<?php
$html = 'href="http://affiliateprogram.affiliates.com/?random=query_string&page=destination"';
echo preg_replace('#http://affiliateprogram.affiliates.com/([^"]+)#is', 'http://www.mylink.com', $html);
?>
http://ideone.com/qaEEM
Use a regular expression such as:
href="(https?:\/\/affiliateprogram.affiliates.com\/[^"]*)"
$data =<<<EOT
bar
foo
<a name="zz" href="http://affiliateprogram.affiliates.com/?query=random&page=destination&string">baz</a>
EOT;
echo (
preg_replace (
'#href="(https?://affiliateprogram.affiliates.com/[^"]*)"#i',
'href="http://www.mylink.com"',
$data
)
);
output
bar
foo
<a name="zz" href="http://www.mylink.com">baz</a>
$a = '<a class="***" href="http://affiliateprogram.affiliates.com/?random=query_string&page=destination" attr="***">';
$b = preg_replace("/<a([^>]*)href=\"http:\/\/affiliateprogram\.affiliates\.com\/[^\"]*\"([^>]*)>/", "<a\\1href=\"http://www.mylink.com/\"\\2>", $a);
var_dump($b); // <a class="***" href="http://www.mylink.com/" attr="***">
That's quite simple, as you only need a single placeholder for the querystring. .*? would normally do, but you can make it more specific by matching anything that's not a double quote:
$html =
preg_replace('~ href="http://affiliateprogram\.affiliates\.com/[^"]*"~i',
' href="http://www.mylink.com"', $html);
People will probably come around and recomend a longwinded domdocument approach, but that's likely overkill for such a task.

String replace the contents of a div

What I want to do:
I have a div with an id. Whenever ">" occurs I want to replace it with ">>". I also want to prefix the div with "You are here: ".
Example:
<div id="bbp-breadcrumb">Home > About > Contact</div>
Context:
My div contains breadcrumb links for bbPress but I'm trying to match its format to a site-wode bread crumb plugin that I'm using for WordPress. The div is called as function in PHP and outputted as HTML.
My question:
Do I use PHP of Javascript to replace the symbols and how do I go about calling the contents of the div in the first place?
Find the code that's generating the <, and either set the appropriate option (breadcrumb_separator or so) or modify the php code to change the separator.
Modifying supposedly static text with JavaScript is not only a maintenance nightmare, extremely brittle, and might lead to a strange rendering (as users see your site being modified if their system is slow), but will also not work in browsers without (or with disabled) JavaScript support.
You could use CSS to add the you are here text:
#bbp-breadcrumb:before {
content: "You are here: ";
}
Browser support:
http://www.quirksmode.org/css/beforeafter_content.html
You could change the > to >> with javascript:
var htmlElement = document.getElementById('bbp-breadcrumb');
htmlElement.innerHTML = htmlElement.innerHTML.split('>').join('>>').split('>').join('>>')
I don't recommend altering content like this, this is really hacky. You'd better change the ouput rendering of the breadcrumb plugin if possible. Within Wordpress this should be doable.
you can use a regex to match the breadcrumb content.. make the changes on it.. and put it back in the context..
check if this helps you:
$the_existing_html = 'somethis before<div id="bbp-breadcrumb">Home > About > Contact</div>something after'; // let's say this is your curreny html.. just added some context
echo $the_existing_html, '<hr />'; // output.. so that you can see the difference at the end
$pattern ='|<div(.*)bbp-breadcrumb(.*)>(.*)<\/div>|sU'; // find some text that is in a div that has "bbp-breadcrumb" somewhere in its atributes list
$all = preg_match_all($pattern, $the_existing_html, $matches); // match that pattern
$current_bc = $matches[3][0]; // get the text inside that div
$new_bc = 'You are here: ' . str_replace('>', '>>', $current_bc);// replace entity for > with the same thing repeated twice
$the_final_html = str_replace($current_bc, $new_bc, $the_existing_html); // replace the initial breadcrumb with the new one
echo $the_final_html; // output to see where we got

How to find urls in images

I am trying to extract urls from a large number of google search results. Getting them from the source code is proving to be quite challenging as the delimiters are not clear and not all of the urls are in the code. Is there a tool that can extract urls from a certain area of an image? If so that may be a better solution.
Any help would be much appreciated.
Try using the JSON/Atom Custom Search API instead: http://code.google.com/apis/customsearch/v1/overview.html. It gives you 100 api calls per day, something you can increase to 10000 per day, if you pay.
Use this excellent lib: http://simplehtmldom.sourceforge.net/manual.htm
// Grab the source code
$html = file_get_html('http://www.google.com/');
// Find all anchors, returns a array of element objects
$ret = $html->find('a');
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $ret->href;
EDit :
All "natural" search urls are in the #res div it seems.. With simplehtmldom find first #res, than all url inside of it. Don't remember exactly the syntax but it must be this way :
$ret = $html->find('div[id=res]')->find('a');
or maybe
$html->find('div[id=res] a');

Categories