Fnd specific place in string and get data from it - php

I have string with multiple image tags in it.
Like this
<img src="/files/028ou2p5g/blogs/9d66329f4/5844644f69fe7-64.jpg">
I want to find FIRST such tag, and get image name from it
5844644f69fe7-64.jpg
How can be this done in PHP asuming there is a lot of other text and tags in string ?

You should use like what #moopet suggested. This is the code, but please give credit to #moopet.
$str = '<img src="/files/028ou2p5g/blogs/9d66329f4/5844644f69fe7-64.jpg">';
$doc = new DOMDocument();
$doc->loadHTML($str);
$first_img = $doc->getElementsByTagName("img")[0];
var_dump( basename($first_img->getAttribute('src')) );

Don't use regex for this. Use PHP's DOM parser or an alternative to extract the tags, then use PHP's basename() function on the src element to extract the filename.

Use preg_match_all() to find all occurences and then get the first one.
Example:
<?php
preg_match_all('/<\s*img[^<>]+?src\s*=\s*[\'\"][^<>\'\"]+?\/([^<>\'\"\/]\.jpg)/', $html, $matches, PREG_SET_ORDER);
var_dump($matches[0]);
?>

Related

Get substring under condition

I have a string $content which looks like that
<h1>Or Any Other tags except img or nothing</h1>
...
<img src="{{media url="image_name.png"}}" alt="image_test" />
...
<h1>Or Any Other tags except img or nothing</h1>
So as the minimal content of the string is
<img src="{{media url="dynamic_image_name.png"}}" alt="dynamic_image_test_alt" />
What I want if to find a way to extract, alter and replace this specific line by the new one?
In the first place I made this:
protected function getStringBetween($str,$from,$to)
{
$sub = substr($str, strpos($str,$from)+strlen($from),strlen($str));
return substr($sub,0,strpos($sub,$to));
}
Using " as from and to variable to get the filename. which is enough to generate what I want.
I would like to do something like that
$generatedContent = "<b>Hi test</b>";
$newContent = alterateContent($content,$generatedContent)
And the $newContent output needs to be:
<h1>Or Any Other tags except img or nothing</h1>
...
<b>Hi test</b>
...
<h1>Or Any Other tags except img or nothing</h1>
I would usually rarely recommend using regular expressions to parse HTML, but in your case, since your goal is to alter something in the database, parsing HTML and then saving it again might accidentally alter some other stuff that you'd want unchanged, such as the formatting.
So here's a simple solution using regex:
function alterateContent(string $html, string $imageFileName, string $replacement): string
{
$imageFileName = preg_quote($imageFileName, '/');
return preg_replace(
"/<img\h+src=\"{{media url="{$imageFileName}"}}\".*?\/>/",
$replacement,
$html
);
}
Usage:
$newContent = alterateContent($yourHtmlString, 'image_name.png', '<b>Hi test</b>');
Note: this assumes the src attribute is always the first attribute of the image.
Demo
You can simply use preg_replace() for that, like this:
$newstring = preg_replace('~<img.*~','<b>Hi test</b>',$oldstring);
Without s modifier, it won't match new line character, so it should work just fine with inline replacement.
If you need to replace the img with exact src, you can do this like this:
$newstring = preg_replace('~<img src="'.$img_source.'".*~','<b>Hi test</b>',$oldstring);
If your source is only a filename without path, and in img tag it's with path, you can use this:
$newstring = preg_replace('~<img src=".*?'.$img_file.'".*~','<b>Hi test</b>',$oldstring);

How to remove plain text from a string after using strip_tags()

So i have a string and I used the strip_tags() function to remove all tags except IMG but I still have plain text next to my IMG element. Here a visual example
$myvariable = "This text needs to be removed<a href='blah_blah_blah'>Blah</a><img src='blah.jpg'>"
So using PHP strip_tags() I was able to remove all tags except the <img> tag (which is what I want). But the thing is now it didn't remove the text.
How do I remove the left over text? Text will always either before tag or after tag as well
[ADDED MORE DETAILS]
$description = 'crazy stuff<img src="https://scontent.cdninstagram.com/t51.2885-15/e15/14287934_1389514537744146_673363238_n.jpg?ig_cache_key=MTMzNzM3MzgwNjAyNDY5NDAzMA%3D%3D.2">';
that's what the variable is actually holding.
Thanks in Advance
Instead of replacing something you can very well extract the values you want:
(<(\w+).+</\2>)
To be used with preg_match(), see a demo on regex101.com.
IN PHP:
<?php
$regex = '~(<(\w+).+</\2>)~';
$string = 'crazy stuff<img src="https://scontent.cdninstagram.com/t51.2885-15/e15/14287934_1389514537744146_673363238_n.jpg?ig_cache_key=MTMzNzM3MzgwNjAyNDY5NDAzMA%3D%3D.2">here as well';
if (preg_match($regex, $string, $match)) {
echo $match[1];
}
?>
Please show your whole piece of code with the use of strip_tags.
You can try: preg_replace('~.*(<img[^>]+>)~', '$1', $myvariable);

How to exract img src using preg_match

I have different format array of html
[amp;src]=>image, anotherone [posthtml]=>image2, anothertwo [nbsp;image3
How to extract img and text using common preg_match() by which we can get perfect image src and text from html. If it is not possible using preg_match(), is there another way to fix it.
If any one know please, reply it. How to fix it.
I need your hand.
The recommended way is to use DOM
$dom = new DOMDocument;
$dom->loadHTML($HTML);
$images = $dom->getElementsByTagName('img');
foreach($images as $im){
$attrs = $imgages->attributes();
$src = $attrs->getNamedItem('src')->nodeValue
}
Using Regular expression:
preg_match_all("/<img .*?(?=src)src=\"([^\"]+)\"/si", $html, $m);
print_r($m);

How to strip tags in PHP using regex?

$string = 'text <span style="color:#f09;">text</span>
<span class="data" data-url="http://www.google.com">google.com</span>
text <span class="data" data-url="http://www.yahoo.com">yahoo.com</span> text.';
What I want to do is get the data-url from all spans with the class data. So, it should output:
$string = 'text <span style="color:#f09;">text</span>
http://www.google.com text http://www.yahoo.com text.';
And then I want to remove all the remaining html tags.
$string = strip_tags($string);
Output:
$string = 'text text http://www.google.com text http://www.yahoo.com text.';
Can someone please tell me how this can be done?
If your string contains more than just the HTML snippet you show, you should use DOM with this XPath
//span/#data-url
Example:
$dom = new DOMDocument;
$dom->loadHTML($string);
$xp = new DOMXPath($dom);
foreach( $xp->query('//span/#data-url') as $node ) {
echo $node->nodeValue, PHP_EOL;
}
The above would output
http://www.google.com
http://www.yahoo.com
When you already have the HTML loaded, you can also do
echo $dom->documentElement->textContent;
which returns the same result as strip_tags($string) in this case:
text text
google.com
text yahoo.com text.
Try to use SimpleXML and foreach by the elements - then check if class attribute is valid and grab the data-url's
preg_match_all("/data/" data-url=/"([^']*)/i", $string , $urls);
You can fetch all URls a=by this way.
And you can also use simplexml as hsz mentioned
The short answer is: don't. There's a lovely rant somewhere around SO explaining why parsing html with regexes is a bad idea. Essentially it boils down to 'html is not a regular language so regular expressions are not adequate to parse it'. What you need is something DOM aware.
As #hsz said, SimpleXML is a good option if you know that your html validates as XML. Better might be DOMDocument::loadHTML which doesn't require well-formed html. Once your html is in a DOMDocument object then you can extract what you will very easily. Check out the docs here.

Using regex to remove HTML tags

I need to convert
$text = 'We had <i>fun</i>. Look at this photo of Joe';
[Edit] There could be multiple links in the text.
to
$text = 'We had fun. Look at this photo (http://example.com) of Joe';
All HTML tags are to be removed and the href value from <a> tags needs to be added like above.
What would be an efficient way to solve this with regex? Any code snippet would be great.
First do a preg_replace to keep the link. You could use:
preg_replace('(.*?)', '$\2 ($\1)', $str);
Then use strip_tags which will finish off the rest of the tags.
try an xml parser to replace any tag with it's inner html and the a tags with its href attribute.
http://www.php.net/manual/en/book.domxml.php
The DOM solution:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//a[#href]') as $node) {
$textNode = new DOMText(sprintf('%s (%s)',
$node->nodeValue, $node->getAttribute('href')));
$node->parentNode->replaceChild($textNode, $node);
}
echo strip_tags($dom->saveHTML());
and the same without XPath:
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('a') as $node) {
if($node->hasAttribute('href')) {
$textNode = new DOMText(sprintf('%s (%s)',
$node->nodeValue, $node->getAttribute('href')));
$node->parentNode->replaceChild($textNode, $node);
}
}
echo strip_tags($dom->saveHTML());
All it does is load any HTML into a DomDocument instance. In the first case it uses an XPath expression, which is kinda like SQL for XML, and gets all links with an href attribute. It then creates a text node element from the innerHTML and the href attribute and replaces the link. The second version just uses the DOM API and no Xpath.
Yes, it's a few lines more than Regex but this is clean and easy to understand and it won't give you any headaches when you need to add additional logic.
I've done things like this using variations of substring and replace. I'd probably use regex today but you wanted an alternative so:
For the <i> tags, I'd do something like:
$text = replace($text, "<i>", "");
$text = replace($text, "</i>", "");
(My php is really rusty, so replace may not be the right function name -- but the idea is what I'm sharing.)
The <a> tag is a bit more tricky. But, it can be done. You need to find the point that <a starts and that the > ends with. Then you extract the entire length and replace the closing </a>
That might go something like:
$start = strrpos( $text, "<a" );
$end = strrpos( $text, "</a>", $start );
$text = substr( $text, $start, $end );
$text = replace($text, "</a>", "");
(I don't know if this will work, again the idea is what I want to communicate. I hope the code fragments help but they probably don't work "out of the box". There are also a lot of possible bugs in the code snippets depending on your exact implementation and environment)
Reference:
strrpos - http://www.php.net/manual/en/function.strrpos.php
replace - http://www.php.net/manual/en/function.str-replace.php
substr - http://php.net/manual/en/function.substr.php
It's also very easy to do with a parser:
# available from http://simplehtmldom.sourceforge.net
include('simple_html_dom.php');
# parse and echo
$html = str_get_html('We had <i>fun</i>. Look at this photo of Joe');
$a = $html->find('a');
$a[0]->outertext = "{$a[0]->innertext} ( {$a[0]->href} )";
echo strip_tags($html);
And that produces the code you want in your test case.

Categories