Regular expression replacment for gettyimages - php

I have editors giving me the embed code from gettyimages in this format:
<div class="getty embed image" style="background-color:#fff;display:inline-block;font-family:'Helvetica Neue',Helvetica,Arial,sans-serif;color:#a7a7a7;font-size:11px;width:100%;max-width:594px;">
<div style="padding:0;margin:0;text-align:left;">Embed from Getty Images</div>
<div style="overflow:hidden;position:relative;height:0;padding:65.656566% 0 0 0;width:100%;">
<iframe src="//embed.gettyimages.com/embed/473144498?et=jFJ38un7Qy1YOLsguPNmmA&viewMoreLink=on&sig=YGqYtdBCwZUYgO864KJJ6ulXuBuS1glNjjOGOHCJ28M=&caption=true" width="594" height="390" scrolling="no" frameborder="0" style="display:inline-block;position:absolute;top:0;left:0;width:100%;height:100%;margin:0;"></iframe>
</div><p style="margin:0;"></p></div>
I can't manipulate the image size with all those extra div tags from Gettyimages and tried to replace them to this format:
<iframe src="//embed.gettyimages.com/embed/473144498?et=jFJ38un7Qy1YOLsguPNmmA&viewMoreLink=on&sig=YGqYtdBCwZUYgO864KJJ6ulXuBuS1glNjjOGOHCJ28M=&caption=true" width="594" height="390"></iframe>
But I've had no luck so far. Can anyone help me?

I'd use the domdocument parser for this, not a regex.
<?php
$string = '<div class="getty embed image" style="background-color:#fff;display:inline-block;font-family:\'Helvetica Neue\',Helvetica,Arial,sans-serif;color:#a7a7a7;font-size:11px;width:100%;max-width:594px;">
<div style="padding:0;margin:0;text-align:left;">Embed from Getty Images</div>
<div style="overflow:hidden;position:relative;height:0;padding:65.656566% 0 0 0;width:100%;">
<iframe src="//embed.gettyimages.com/embed/473144498?et=jFJ38un7Qy1YOLsguPNmmA&viewMoreLink=on&sig=YGqYtdBCwZUYgO864KJJ6ulXuBuS1glNjjOGOHCJ28M=&caption=true" width="594" height="390" scrolling="no" frameborder="0" style="display:inline-block;position:absolute;top:0;left:0;width:100%;height:100%;margin:0;"></iframe>
</div><p style="margin:0;"></p></div>';
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($string);
libxml_clear_errors();
$doc->getElementsByTagName('iframe')->item(0)->removeAttribute('style');
echo $doc->saveHTML();
Output:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div class="getty embed image" style="background-color:#fff;display:inline-block;font-family:'Helvetica Neue',Helvetica,Arial,sans-serif;color:#a7a7a7;font-size:11px;width:100%;max-width:594px;">
<div style="padding:0;margin:0;text-align:left;">Embed from Getty Images</div>
<div style="overflow:hidden;position:relative;height:0;padding:65.656566% 0 0 0;width:100%;">
<iframe src="//embed.gettyimages.com/embed/473144498?et=jFJ38un7Qy1YOLsguPNmmA&viewMoreLink=on&sig=YGqYtdBCwZUYgO864KJJ6ulXuBuS1glNjjOGOHCJ28M=&caption=true" width="594" height="390" scrolling="no" frameborder="0"></iframe>
</div><p style="margin:0;"></p></div></body></html>
This assumes you only have 1 iframe. If you have multiples assign
$doc->getElementsByTagName('iframe')
to a variable and then iterate through it.
If you only want the height, width, and src attributes it is probably better to select those then build the element. Otherwise you will have to remove every attribute the user could add..
So modifying the above approach:
$iframes = $doc->getElementsByTagName('iframe');
foreach($iframes as $iframe) {
$src = $iframe->getAttribute('src');
$height = $iframe->getAttribute('height');
$width = $iframe->getAttribute('width');
echo "<iframe src='$src' height='$height' width='$width'></iframe>";
}
This would give:
<iframe src='//embed.gettyimages.com/embed/473144498?et=jFJ38un7Qy1YOLsguPNmmA&viewMoreLink=on&sig=YGqYtdBCwZUYgO864KJJ6ulXuBuS1glNjjOGOHCJ28M=&caption=true' height='390' width='594'></iframe>
You also could use the " for the attribute encapsulation you'll just need to escape them, or concatenate the variables and use single quotes for encapsulation.

Too get them all #(( [a-z]+)="([^"])+")+# via perg_match_all

Related

Find all ocurrences and replace one by one PHP

I'm replacing all ocurrences in a string to . And I'm doing:
1) I get video_id from youtube url.
preg_match('/embed\/([\w+\-+]+)[\"\?]/', $string,$video_id);
2) I remove iframe with amp-youtube adding the url video id.
$string = preg_replace( '/<iframe\s+.*?\s+src=(".*?").*?<\/iframe>/',
'<amp-youtube data-videoid="'.$video_id[1].'" width="480" height="270" layout="responsive"></amp-youtube>', (str_replace("https://www.youtube.com/embed/","", $string)));
That works fine for just one ocurrence.
But If I have more than one iframe... Ok I can do
preg_match_all('/embed\/([\w+\-+]+)[\"\?]/', $string,$video_id);
to get all video ids in the string.
But how can I loop to add each id to every amp-youtube data-videoid in a string??
Thanks!!
I wish you would have posted a minimal sample html string and your expected output, but I think I gather your meaning.
Don't use regex to parse html. Iterate while there is an existing iframe tag and replace it with your desired tag using the prepared substring from the iframe's src value.
Code: (Demo)
$html = <<<HTML
<div>
<p>
<iframe allowfullscreen="" class="media-element file-default" data-fid="2219" data-media-element="1" frameborder="0" height="360" src="https://www.youtube.com/embed/sNEJOm4hSaw" width="640"></iframe>
</p>
<p>
<iframe allowfullscreen="" class="media-element file-default" data-fid="2219" data-media-element="1" frameborder="0" height="360" src="https://www.youtube.com/embed/abcdefghijk" width="640"></iframe>
</p>
</div>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
while ($iframe = $dom->getElementsByTagName('iframe')->item(0)) {
$amp = $dom->createElement('amp-youtube');
$amp->setAttribute('width', '480');
$amp->setAttribute('height', '270');
$amp->setAttribute('layout', 'responsive');
$amp->setAttribute('data-videoid', str_replace("https://www.youtube.com/embed/","", $iframe->getAttribute('src')));
$iframe->parentNode->replaceChild($amp, $iframe);
}
echo $dom->saveHTML();
Output:
<div>
<p>
<amp-youtube width="480" height="270" layout="responsive" data-videoid="sNEJOm4hSaw"></amp-youtube>
</p>
<p>
<amp-youtube width="480" height="270" layout="responsive" data-videoid="abcdefghijk"></amp-youtube>
</p>
</div>

Regex find in text iframe and paste allowfulscreen attribute PHP

I have text html:
<p>Is it my awesome text, and down I place my iframe with video,
withot allowfullscreen attribute</p>
<p><iframe src="site.com/video.ogg" width="500" height="400"></iframe></p>
How I can with regex paste allowfullscreen?
<?php preg_replace('iframe', 'allowfullscreen'); ?>
But if allowfullscreen already exists, not paste.
Use the regex from this example:
<?php
$no_fs = <<<HTML
<p>Is it my awesome text, and down I place my iframe with video,
withot allowfullscreen attribute</p>
<p><iframe src="site.com/video.ogg" width="500" height="400"></iframe></p>
HTML;
$has_fs = <<<HTML
<p>Is it my awesome text, and down I place my iframe with video,
withot allowfullscreen attribute</p>
<p><iframe src="site.com/video.ogg" allowfullscreen width="500" height="400"></iframe></p>
HTML;
echo preg_replace('/(<iframe(?:[^>](?!allowfullscreen))+)>/', '$1 allowfullscreen>', $no_fs);
echo "<br>";
echo preg_replace('/(<iframe(?:[^>](?!allowfullscreen))+)>/', '$1 allowfullscreen>', $has_fs);
See the output in this ideone.com snippet.
With this regex you won't add allowfullscreen to iframes already having this attribute anywhere between < and >.

regex replace iframe/object by youtube video id

I have the html like below. Basically it has three videos some of them embed by iframe tag other object tag. I have the YouTube video id. So I want to replace the iframe/object tag with some text (Video can't be showed here ) by YouTube video id.
MY HTML
<p>Video 1</p>
<p><iframe width="604" height="453" src="http://www.youtube.com/embed/TOsGAxFcYls?feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>Video 2</p>
<p><iframe width="604" height="340" src="http://www.youtube.com/embed/Y-AYC3_DbpY?feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>Video 3</p>
<object width="560" height="315"><param name="movie" value="//www.youtube.com/v/-1jKtYuXkrQ?version=3&hl=en_US"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="//www.youtube.com/v/-1jKtYuXkrQ?version=3&hl=en_US" type="application/x-shockwave-flash" width="560" height="315" allowscriptaccess="always" allowfullscreen="true"></embed></object>
Now, I want to replace the replace the video 1 and 3. I have both video id's.
video 1 = TOsGAxFcYls
video 3 = -1jKtYuXkrQ
Now, I want to replace both iframe and object by particular text.
Expected output
<p>Video 1</p>
<p><strong>Video 1 has been removed video id (TOsGAxFcYls)</strong></p>
<p>Video 2</p>
<p><iframe width="604" height="340" src="http://www.youtube.com/embed/Y-AYC3_DbpY?feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>Video 3</p>
<strong>Video 3 has been removed video id (-1jKtYuXkrQ)</strong>
Note : I may add custom replacement text for each replacement of
video.
please help me out with regular expression to do the above job!
Here is a simple solution, just add the ID's and their replacement texts to the $video_ids array.
$html= ... // your html here
$video_ids = array
(
array("TOsGAxFcYls","First replacement text"),
array("Y-AYC3_DbpY","Second replacement text")
);
foreach ($video_ids as &$video_id) {
$patt = "/<object(.*)$video_id[0](.*)<\/object>/";
$html = preg_replace($patt, $video_id[1], $html);
$patt = "/<iframe.*?src\=".*?'.$video_id[0].'.*.<\/iframe>/i";
$html = preg_replace($patt, $video_id[1], $html);
}
echo $html; // here are your changed values
Here is a sample code that I believe has the effect your asking for:
<?php
$text = '<iframe width="604" height="340" src="http://www.youtube.com/embed/Y-AYC3_DbpY?feature=oembed" frameborder="0" allowfullscreen></iframe>';
$matches = array();
$video_id = "Y-AYC3_DbpY";
preg_match('/<iframe.*?src\=".*?'.$video_id.'.*.<\/iframe>/i', $text, $matches);
if(!empty($matches)){
//replace iframe;
}
else{
//do something else
}
?>

PHP - How to select random tag in the string

For example, I have this string containing some number of iframe tags (but there can be also some text or links, so the point is to select only iframe tags):
<p><iframe frameborder="0" height="180" src="http://www.mixcloud.com/widget/iframe/?feed=http%3A%2F%2Fwww.mixcloud.com%2Fskaph%2Fskaphchapeau-rouge-2022014%2F&embed_type=widget_standard&embed_uuid=9ff7c333-5c68-40d6-b9c7-b475c6a8d297&hide_tracklist=1&replace=0&hide_cover=1" width="600" ></iframe></p>
<p><iframe frameborder="0" height="180" src="http://www.mixcloud.com/widget/iframe/?feed=http%3A%2F%2Fwww.mixcloud.com%2Fskaph%2Fx-tract-podcast-night-30-skaph%2F&embed_type=widget_standard&embed_uuid=7186f43a-4bc7-431d-8041-f51366355c44&hide_tracklist=1&replace=0&hide_cover=1" width="600" ></iframe></p>
<p><iframe frameborder="0" height="180" src="http://www.mixcloud.com/widget/iframe/?feed=http%3A%2F%2Fwww.mixcloud.com%2Fskaph%2Fskaphclick-clack-07122013-experiment-liberec%2F&embed_type=widget_standard&embed_uuid=7f2202e6-fd70-45ac-ac1e-6c9dca0ad725&hide_tracklist=1&replace=0&hide_cover=1" width="600" ></iframe></p>
<p><iframe frameborder="0" height="180" src="http://www.mixcloud.com/widget/iframe/?feed=http%3A%2F%2Fwww.mixcloud.com%2FTFSpodcast%2Ftechno-for-soul-podcast-11-mixed-by-skaph%2F&embed_type=widget_standard&embed_uuid=e3f68ffd-488d-4d78-b369-a46c785f59a5&hide_tracklist=1&replace=0&hide_cover=1" width="600" ></iframe></p>
<p><iframe frameborder="0" height="180" src="http://www.mixcloud.com/widget/iframe/?feed=http%3A%2F%2Fwww.mixcloud.com%2Fskaph%2Fskaphtechno-je-v%C5%A1echno-5%2F&embed_type=widget_standard&embed_uuid=2c80035e-27e8-4321-b07d-395e6777b98c&hide_tracklist=1&replace=0&hide_cover=1" width="600" ></iframe></p>
<p><iframe frameborder="0" height="132" src="http://www.mixcloud.com/widget/iframe/?feed=http%3A%2F%2Fwww.mixcloud.com%2Fskaph%2Fskaphtechno-je-v%25C5%25A1echno-vol-2-liberec-experiment-18052013%2F&embed_uuid=f81d24a4-c2f8-4bc5-a10f-7f3fb2243392&stylecolor=&embed_type=widget_standard" width="480"></iframe></p>
<p><iframe frameborder="0" height="132" src="http://www.mixcloud.com/widget/iframe/?feed=http%3A%2F%2Fwww.mixcloud.com%2Fskaph%2Fskaphexperiment-18012013%2F&embed_uuid=e63685e9-901c-4d71-a1c5-69d0afb130d6&stylecolor=&embed_type=widget_standard" width="480"></iframe></p>
<p><iframe frameborder="0" height="132" src="http://www.mixcloud.com/widget/iframe/?feed=http%3A%2F%2Fwww.mixcloud.com%2Fskaph%2Fskaph-renaissance-winter-mix-2012%2F&embed_uuid=5a7e4685-cf6a-4f84-ba1c-13251d5b7f59&stylecolor=&embed_type=widget_standard" width="480"></iframe></p>
<p><iframe frameborder="0" height="132" src="http://www.mixcloud.com/widget/iframe/?feed=http%3A%2F%2Fwww.mixcloud.com%2Fskaph%2Fskaph-mini-technik%2F&embed_uuid=7818bedc-94d0-46b1-8193-4cafcf65ffb5&stylecolor=&embed_type=widget_standard" width="480"></iframe></p>
I need to select random iframe tag string from this and I need both opening and closing tag to be included. I suppose I should use something like explode and then use array_rand() function, but there is no divider. Other option that came to my mind is regex, but understanding of that still escapes me.
Regular expressions are not suitable for parsing HTML. Use a DOM parser instead -- here's a solution using PHP's native DOMDocument class:
$dom = new DOMDocument;
$dom->loadHTML($html);
$iframes = $dom->getElementsByTagName('iframe');
$index = mt_rand(0, $iframes->length);
$random_tag = $iframes->item($index);
In the above code, first a random index between 0 and the total number of tags ($iframes->length) is chosen with mt_rand(), and then the item() method is used to specifically access that tag. Once you have the tag, you can do any further processing. In the demo, I've shown you how to extract the src attribute just to show it's random.
Online demo

Using regex to wrap images in tags

I've been using regex to wrap my images in < a > tags and altering their paths etc.
I know using dom for this is better, having read a lot of threads about wrapping, but I'm unable to understand how to.
This is what I'm using:
$comments = (preg_replace('#(<img.+src=[\'"]/uploads/userdirs/admin)(?:.*?/)(.+?)\.(.+?)([\'"].*?>)#i', '<a class="gallery" rel="'.$pagelink.'" href=/uploads/userdirs/'.$who.'/$2.$3>$1/mcith/mcith_$2.$3$4</a>', $comments));
It successfully wraps each image in the tags I want. But only if the string provided ($comments) has the right markup.
<p><img src="/uploads/userdirs/admin/1160501362291.png" alt="" width="1280" height="960" /></p>
<p><img src="/uploads/userdirs/admin/100_Bullets_68_1280x1024.jpg" alt="" width="1280" height="1024" /></p>
When presented like this, it works. I'm using tinymce so it wraps in < p > when I do a linebreak with enter. But when I don't do that, when I just insert images one after another so the HTML looks like this, it won't:
<p><img src="/uploads/userdirs/admin/1160501362291.png" alt="" width="1280" height="960" /><img src="/uploads/userdirs/admin/100_Bullets_68_1280x1024.jpg" alt="" width="1280" height="1024" /></p>
It will instead wrap those 2 images in the same < a > tag. Making the output look like this:
<p><a class="gallery" rel="test" href="/uploads/userdirs/admin/100_Bullets_68_1280x1024.jpg">
<img src="/uploads/userdirs/admin/1160501362291.png" alt="" width="1280" height="960">
<img src="/uploads/userdirs/admin/mcith/mcith_100_Bullets_68_1280x1024.jpg" alt="" width="1280" height="1024">
</a></p>
Which is wrong. The output I want is this:
<p><a class="gallery" rel="test2" href="/uploads/userdirs/admin/100_Bullets_68_1280x1024.jpg"><img src="/uploads/userdirs/admin/mcith/mcith_100_Bullets_68_1280x1024.jpg" alt="" width="1280" height="1024"></a></p>
<p><a class="gallery" rel="test2" href="/uploads/userdirs/admin/1154686260226.jpg"><img src="/uploads/userdirs/admin/mcith/mcith_1154686260226.jpg" alt="" width="1280" height="800"></a></p>
I've left out a few details, but here's how I would do it using DOMDocument:
$s = <<<EOM
<p><img src="/uploads/userdirs/admin/1160501362291.png" alt="" width="1280" height="960" /></p>
<p><img src="/uploads/userdirs/admin/100_Bullets_68_1280x1024.jpg" alt="" width="1280" height="1024" /></p>
EOM;
$d = new DOMDocument;
$d->loadHTML($s);
foreach ($d->getElementsByTagName('img') as $img) {
$img_src = $img->attributes->getNamedItem('src')->nodeValue;
if (0 === strncasecmp($img_src, '/uploads/userdirs/admin', 23)) {
$a = $d->createElement('a');
$a->setAttribute('class', 'gallery');
$a->setAttribute('rel', 'whatever');
$a->setAttribute('href', '/uploads/userdirs/username/' . $img_src);
// disconnect image tag from parent
$img->parentNode->replaceChild($a, $img);
// and move to anchor
$a->appendChild($img);
}
}
echo $d->saveHTML();
You should change .* in your regular expression with [^>]*. The latter means: any character expect than >. Because regular expression gets as long match as possible. Without this additional condition, this ends up with two <img>'s matched.

Categories