Retrieve data attribute value using preg_match - php

I have the following HTML:
<div class="video cover" data-thumb="https://i.vimeocdn.com/video/1234567.webp?mw=700&mh=393" style="background-image: url(https://i.vimeocdn.com/video/525930392.webp?mw=700&mh=393);">
I would like to retrieve the data-thumb URL value.
I have attempted to retrieve the value using the following:
$iframe = '<div class="video cover" data-thumb="https://i.vimeocdn.com/video/1234567.webp?mw=700&mh=393" style="background-image: url(https://i.vimeocdn.com/video/525930392.webp?mw=700&mh=393);">';
preg_match('/data-thumb="(.*?)"/', $iframe, $matches);
echo $matches[0];
However, this is not retrieving any matches.
EDIT: Thank you for your help and answers. It appears I made an error with the output of $iframe, which was displaying content from an iframe (doh). So preg_match couldn't target it.

preg_match('/data-thumb="([^"]*)"/', $iframe, $matches);
if (isset($matches[1]))
echo $matches[1]; // echo the value of the data-thumb attribute
This works for me. And indeed, original code works fine too.

Try this regex:
/data-thumb="[\s\S]*?"/

Related

preg_match_all Pattern Search

My subject is
some html codes<h3 class="r">Some Title</h3>some html codes
My current pattern is:
"/<h3 class="r"><a href="\/url?\?q\=http(...)/"
The result is:
<h3 class="r"><a href="/url?q=http://
I wanted to get the exact url, http://www.somedomain.com/args/
or just <h3 class="r">Some Title</h3> so i can parse it to return the url.
but i could not make it.
Any help would be appreciated. Thank you!
LIVE DEMO
Try this:
'/<h3 class="r"><a href="\/url?\?q\=(http.*)\/"/'
$string = 'some html codes<h3 class="r">Some Title</h3>some html codes'
preg_match_all('!http://(.*)"!', $string, $result_array);
print_r($result_array);
Try it.

Using PHP to scrape image url from twitter page

I'm trying to scrape an image url from twitter e.g. 'https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg:large' using php. I have found the following php code and file_get_contents is working but I don't think the regurlar expression is matching the url. Can you help debug this code? Thanks in advance.
Here is a snippet from twitter which contains the image:
<div class="media-gallery-image-wrapper">
<img class="large media-slideshow-image" alt="" src="https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg:large" height="480" width="358">
</div>
Here is the php code:
<?php
$url = 'http://t.co/s54fJgrzrG';
$twitter_page = file_get_contents($url);
preg_match('/(http:\/\/p.twimg.com\/[^:]+):/i', $twitter_page, $matches);
$imgURL = array_pop($matches);
echo $imgURL;
?>
Something like this should provide a URL.
<?php
$url = 'http://t.co/s54fJgrzrG';
$twitter_page = file_get_contents($url);
preg_match_all('!http[s]?:\/\/pbs\.twimg\.com\/[^:]+\.(jpg|png|gif)!i', $twitter_page,$matches);
echo $img_url=$matches[0][0];
?>
Response is
https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg
It appears that your regular expression is missing part of the beginning of the URI. It was missing the 'pbs' part, and was not able to determine if http or https.
preg_match('/((http|https):\/\/pbs.twimg.com\/[^:]+):/i', $twitter_page, $matches);

PHP preg_match_all - Get what is inside pattern

I have a lot of difficulties in this REGEX stuff, can any one help improve my code?
Thank you in advanced.
What I need is to get the content in side the [Slide] CONTENT [/Slide] tags.
What i am doing is:
preg_match ('/\[Slide\].*\[\/Slide\]/s', $content, $matches);
$conteudo_slide = $matches[0];
$conteudo_full = preg_replace('/\[Slide\]/s', "", $conteudo_slide);
$conteudo_full = preg_replace('/\[\/Slide\]/s', "", $conteudo_full);
the content of the page is:
<p>[Slide]http://www.gprco-cpa.com/images/industries/corporations.jpg[/Slide]</p>
<p>[Slide]http://www.expatcpa.com/Corporation_HQ.jpg[/Slide]</p>
<p><br />[Slide]</p>
<p><a href="http://localhost/~tiago/main_wordpress/?attachment_id=437" rel="attachment wp-att-437">
<img class="alignright size-full wp-image-437" title="lightbulb"src="http://localhost/~tiago/main_wordpress/wpcontent/uploads/2012/09/lightbulb1.jpg" alt="" width="500" height="334" /></a>[/Slide]</p><p> </p>
<p>[Slide]http://www.youtube.com/watch?v=SHVOyGVQ3Tw&feature=g-all-u[/Slide]</p>
IsnĀ“t any more correct way of doing this?
Thank you.
try adding ():
preg_match ('/\[Slide\](.*?)\[\/Slide\]/s', $content, $matches);

Getting the first image in string with php

I'm trying to get the first image from each of my posts. This code below works great if I only have one image. But if I have more then one it gives me an image but not always the first.
I really only want the first image. A lot of times the second image is a next button
$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sara" title="Sara" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $texthtml, $matches);
$first_img = $matches [1] [0];
now I can take this "$first_img" and stick it in front of the short description
<img alt="Sara" title="Sara" src="<?php echo $first_img;?>"/>
If you only need the first source tag, preg_match should do instead of preg_match_all, does this work for you?
<?php
$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sara" title="Sara" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
preg_match('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i', $texthtml, $image);
echo $image['src'];
?>
Don't use regex to parse html.
Use an html-parsing lib/class, as phpquery:
require 'phpQuery-onefile.php';
$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sarahehe" title="Saraxd" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
$pq = phpQuery::newDocumentHTML($texthtml);
$img = $pq->find('img:first');
$src = $img->attr('src');
echo "<img alt='foo' title='baa' src='{$src}'>";
Download: http://code.google.com/p/phpquery/
After testing an answer from here Using regular expressions to extract the first image source from html codes? I got better results with less broken link images than the answer provided here.
While regular expressions can be good for a large variety of tasks, I find it usually falls short when parsing HTML DOM. The problem with HTML is that the structure of your document is so variable that it is hard to accurately (and by accurately I mean 100% success rate with no false positive) extract a tag.
For more consistent results use this object http://simplehtmldom.sourceforge.net/ which allows you to manipulate html.
An example is provided in the response in the first link I posted.
function get_first_image($html){
require_once('SimpleHTML.class.php')
$post_html = str_get_html($html);
$first_img = $post_html->find('img', 0);
if($first_img !== null) {
return $first_img->src';
}
return null;
}
Enjoy

create anchors in a page with the content of <h2></h2> in PHP

Well I have a html text string in a variable:
$html = "<h1>title</h1><h2>subtitle 1</h2> <h2>subtitle 2</h2>";
so I want to create anchors in each subtitle that has with the same name and then print the html code to browser and also get the subtitles as an array.
I think is using regex.. please help.
I think this will do the trick for you:
$pattern = "|<h2>(.*)</h2>|U";
preg_match_all($pattern,$html,$matches);
foreach($matches[1] as $match)
$html = str_replace($match, "<a name='".$match."' />".$match, $html);
$array_of_elements = $matches[1];
Just make sure that $html has the existing html before this code starts. Then it will have an <a name='foo' /> added after this completes, and $array_of_elements will have the array of matching text values.

Categories