extracting link from a string containing html and javascript - php

I have a string that contains the following:
<img data-bind="defaultSrc: {srcDesktop: 'http://desktoplink', srcMobile: 'http://mobilelink', fallback: 'http://baseurl'}" >
I am trying to extract the srcDesktop contained inside the string. I want my final result to yield me with the link http://desktoplink. What is the best way to achieve that other than str_replace? I have a dataset that contains those strings so I am looking for a formula to extract it in php.
Here is how I have been doing it, but there is got to be a more efficient way:
$string = '<img data-bind="defaultSrc: {srcDesktop: \'http://desktoplink\', srcMobile: \'http://mobilelink\', fallback: \'http://baseurl\'}" >';
$test = explode(" ",$string);
echo "<br>".str_replace(",","",str_replace("'","",$test['3']));

You can use preg_match
$string = '<img data-bind="defaultSrc: {srcDesktop: \'http://desktoplink\', srcMobile: \'http://mobilelink\', fallback: \'http://baseurl\'}" >';
preg_match('/.*\bsrcDesktop:\s*(?:\'|\")(.*?)(?:\'|\").*/i', $string, $matches);
if (isset($matches[1])) {
echo trim($matches[1]);
}

you can use DOMDocument and json_decode to get this value, if you can change the code to the code below (added some '-signs):
$string = "<img data-bind=\"'defaultSrc': {'srcDesktop': 'http://desktoplink', 'srcMobile': 'http://mobilelink', 'fallback': 'http://baseurl'}\" >";
$doc = new DOMDocument();
$doc->loadHTML($string);
$data = str_replace('\'','"',$doc->getElementsByTagName('img')[0]->getAttribute('data-bind'));
$json = json_decode('{'.$data.'}');
var_dump($json->defaultSrc->srcDesktop);

Related

Replace string with different strings containing pattern

Say you have a string like
$str = "<img src='i12'><img src='i105'><img src='i12'><img src='i24'><img src='i15'>....";
is it possible to replace every i+n by the nth value of an array called $arr
so that for example <img src='i12'> is replaced by <img src='$arr[12]'>.
If I were you, I'd simply parse the markup, and process/alter it accordingly:
$dom = new DOMDocument;
$dom->loadHTML($str);//parse your markup string
$imgs = $dom->getElementsByTagName('img');//get all images
$cleanStr = '';//the result string
foreach($imgs as $img)
{
$img->setAttribute(
'src',
//get value of src, chop of first char (i)
//use that as index, optionally cast to int
$array[substr($img->getAttribute('src'), 1)]
);
$cleanStr .= $dom->saveXML($img);//get string representation of node
}
echo $cleanStr;//echoes new markup
working demo here
Now in the demo, you'll see the src attributes are replaced with a string like $array[n], the code above will replace the values with the value of an array...
I would use preg_replace for this:
$pattern="/(src=)'\w(\d+)'/";
$replacement = '${1}\'\$arr[$2]\'';
preg_replace($pattern, $replacement, $str);
$pattern="/(src=)'\w(\d+)'/";
It matches blocks of text like src='letter + digits'.
This catches the src= and digit blocks to be able to print them back.
$replacement = '${1}\'\$arr[$2]\'';
This makes the replacement itself.
Test
php > $str = "<img src='i12'><img src='i105'><img src='i12'><img src='i24'><img src='i15'>....";
php > $pattern="/(src=)'\w(\d+)'/";
php > $replacement = '${1}\'\$arr[$2]\'';
php > echo preg_replace($pattern, $replacement, $str);
<img src='$arr[12]'><img src='$arr[105]'><img src='$arr[12]'><img src='$arr[24]'><img src='$arr[15]'>....

Regex preg_replace find image in string WITH img attributes

I'm trying to find ALL images in my blog posts with regex. The code below returns images IF the code is clean and the SRC tag comes right after the IMG tag. However, I also have images with other attributes such as height and width. The regex I have does not pick that up... Any ideas?
The following code returns images that looks like this:
<img src="blah_blah_blah.jpg">
But not images that looks like this:
<img width="290" height="290" src="blah_blah_blah.jpg">
Here is my code
$pattern = '/<img\s+src="([^"]+)"[^>]+>/i';
preg_match($pattern, $data, $matches);
echo $matches[1];
Use DOM or another parser for this, don't try to parse HTML with regular expressions.
$html = <<<DATA
<img width="290" height="290" src="blah.jpg">
<img src="blah_blah_blah.jpg">
DATA;
$doc = new DOMDocument();
$doc->loadHTML($html); // load the html
$xpath = new DOMXPath($doc);
$imgs = $xpath->query('//img');
foreach ($imgs as $img) {
echo $img->getAttribute('src') . "\n";
}
Output
blah.jpg
blah_blah_blah.jpg
Ever think of using the DOM object instead of regex?
$doc = new DOMDocument();
$doc->loadHTML('<img src="http://example.com/img/image.jpg" ... />');
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
You'd better to use a parser, but here is a way to do with regex:
$pattern = '/<img\s.*?src="([^"]+)"/i';
The problem is that you only accept \s+ after <img. Try this instead:
$pattern = '/<img\s+[^>]*?src="([^"]+)"[^>]+>/i';
preg_match($pattern, $data, $matches);
echo $matches[1];
Try this:
$pattern = '/<img\s.*?src=["\']([^"\']+)["\']/i';
Single or double quote and dynamic src attr position.

I have to find title but print value how?

My code is given below:-
$text = "<div class='title'>Title</div><div class='content'>This is title</div>";
$words = array('Title');
$words = join("|", $words);
$matches = array();
if ( preg_match('/' . $words . '/i', $text, $matches) ){
echo "Words matched: <br/>";
print_r($matches);
}
else{
echo "Not match";
}
The problem is that in above code I am finding title but i don't want to print title; I want to print this: "This is title" and I am not understanding how I can print this by finding title.
Because title is like keyword that will not change but value which i want to print it is dynamic value and it will change every time, that's why i cannot finding value of title. So how can i do it?
Don't use regex for parsing HTML. Use a DOM Parser instead. In this case, you can use an XPath expression to get the element by class name:
$text = "<div class='title'>Title</div>
<div class='content'>This is title</div>";
$dom = new DOMDocument;
$dom->loadHTML($text);
$xpath = new DOMXPath($dom);
$title = $xpath->query('//*[#class="content"]')->item(0)->nodeValue;
Output:
This is title
This should get you started. If the title is in a different position, you can modify the expression accordingly to retrieve it.

Using php preg to find url and replace it with a second url

I've got a large number of webpages stored in an MySQL database.
Most of these pages contain at least one (and occasionally two) entries like this...
<a href="http://first-url-which-always-ends-with-a-slash/">
<img src="http://second-different-url-which-always-ends-with.jpg" />
</a>
I'd like to just set up a little php loop to go through all the entires replacing the first url with a copy of the second url for that entry.
How can I use preg to:
find the second url from the image tag
replace the first url in the a tag, with a copy of the second url
Is this possible?
see this url
PHP preg match / replace?
see also:- http://php.net/manual/en/function.preg-replace.php
$qp = qp($html);
foreach ($qp->find("img") as $img) {
$img->attr("title", $img->attr("alt"));
}
print $qp->writeHTML();
Though it might be feasible in this simple case to resort to an regex:
preg_replace('#(<img\s[^>]*)(\balt=)("[^"]+")#', '$1$2$3 title=$3', $h);
(It would make more sense to use preg_replace_callback to ensure no title= attribute is present yet.)
You can do following :
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->formatOutput = true;
$source = "<a href=\"http://first-url-which-always-ends-with-a-slash/\">
<img src=\"http://second-different-url-which-always-ends-with.jpg\" />
</a>";
$dom->loadHTML($source);
$tags = $dom->getElementsByTagName('a');
foreach ($tags as $tag) {
$atag = $tag->getAttribute('href');
$imgTag = $dom->getElementsByTagName('img');
foreach ($imgTag as $img) {
$img->setAttribute('src', $atag);
echo $img->getAttribute('src');
}
}
Thanks for the suggestions i can see how they are better than using Preg.
Even so i finally solved my own question like this...
$result = mysql_query($select);
while ($frow = mysql_fetch_array($result)) {
$page_content = $frow['page_content'];
preg_match("#<img\s+src\s*=\s*([\"']+http://[^\"']*\.jpg[\"']+)#i", $page_content, $matches1);
print_r($matches1);
$imageURL = $matches1[1] ;
preg_match("#<a\s+(?:[^\"'>]+|\"[^\"]*\"|'[^']*')*href\s*=\s(\"http://[^\"]+/\"|'http://[^']+/')#i", $page_content, $matches2);
print_r( $matches2 );
$linkURL = $matches2[1] ;
$finalpage=str_replace($linkURL, $imageURL, $page_content) ;
}

Remove all text within specific tags

I am interesting in removing all the text within the following tags:
<p class="wp-caption-text">Remove this text</p>
Can anybody give me an idea of how this can be done in php?
Thank you very much
Get rid of the tag and content inside of it:
$content = preg_replace('/<p\sclass=\"wp\-caption\-text\">[^<]+<\/p>/i', '', $content);
or if you want to preserve the tags:
$content = preg_replace('/(<p\sclass=\"wp\-caption\-text\">)[^<]+(<\/p>)/i', '$1$2', $content);
As bit higher-level alternative to regular expressions.
You can process with DOM. You can match all nodes you're looking for with XPath //p[#class="wp-caption-text"].
For example:
$doc = new DOMDocument();
$doc->loadHTML($yourHTMLasString);
$xpath = new DOMXPath($doc);
$query = '//p[#class="wp-caption-text"]';
$entries = $xpath->query($query);
foreach ($entries as $entry) {
$entry->textContent = '';
}
echo $doc->saveHTML();
Try this:
$string = '<p class="wp-caption-text">Remove this text</p>';
$pattern = '/(.*<p .*>).*(<\/p>.*)/';
$replacement = '$1$2';
echo preg_replace($pattern, $replacement, $string);
if its always the same tag you could simply do search for the string. use the position resulting to substring from it to the closing tag.
Or you could use a regular expression, there are good ones posted here that can help you.

Categories