Removing all links and divs of certain class - php

Lets say I have the following string (from a much larger string with multiple similiar strings)
$str = '<div class='testdiv remove'>randomtext</div>
<div class='testdiv'>randomtext randomtext</div>';
The class 'remove' was added through a javascript function. How would I remove all elements of the class 'remove' and all links so that the string becomes this:
$str = '<div class='testdiv'>randomtext </div>';
I can't use jquery to remove these tags since I have to feed this into a php library function. How would I remove these?

Use a dom parser http://simplehtmldom.sourceforge.net/

use regular expression :)
$pattern = "/(?:<div class='testdiv remove'>[\s\S]+?</div>|<a[^>]+>[^<]+</a>)/i"
$str = preg_replace($pattern, "", $str);

Related

Fnd specific place in string and get data from it

I have string with multiple image tags in it.
Like this
<img src="/files/028ou2p5g/blogs/9d66329f4/5844644f69fe7-64.jpg">
I want to find FIRST such tag, and get image name from it
5844644f69fe7-64.jpg
How can be this done in PHP asuming there is a lot of other text and tags in string ?
You should use like what #moopet suggested. This is the code, but please give credit to #moopet.
$str = '<img src="/files/028ou2p5g/blogs/9d66329f4/5844644f69fe7-64.jpg">';
$doc = new DOMDocument();
$doc->loadHTML($str);
$first_img = $doc->getElementsByTagName("img")[0];
var_dump( basename($first_img->getAttribute('src')) );
Don't use regex for this. Use PHP's DOM parser or an alternative to extract the tags, then use PHP's basename() function on the src element to extract the filename.
Use preg_match_all() to find all occurences and then get the first one.
Example:
<?php
preg_match_all('/<\s*img[^<>]+?src\s*=\s*[\'\"][^<>\'\"]+?\/([^<>\'\"\/]\.jpg)/', $html, $matches, PREG_SET_ORDER);
var_dump($matches[0]);
?>

How to remove plain text from a string after using strip_tags()

So i have a string and I used the strip_tags() function to remove all tags except IMG but I still have plain text next to my IMG element. Here a visual example
$myvariable = "This text needs to be removed<a href='blah_blah_blah'>Blah</a><img src='blah.jpg'>"
So using PHP strip_tags() I was able to remove all tags except the <img> tag (which is what I want). But the thing is now it didn't remove the text.
How do I remove the left over text? Text will always either before tag or after tag as well
[ADDED MORE DETAILS]
$description = 'crazy stuff<img src="https://scontent.cdninstagram.com/t51.2885-15/e15/14287934_1389514537744146_673363238_n.jpg?ig_cache_key=MTMzNzM3MzgwNjAyNDY5NDAzMA%3D%3D.2">';
that's what the variable is actually holding.
Thanks in Advance
Instead of replacing something you can very well extract the values you want:
(<(\w+).+</\2>)
To be used with preg_match(), see a demo on regex101.com.
IN PHP:
<?php
$regex = '~(<(\w+).+</\2>)~';
$string = 'crazy stuff<img src="https://scontent.cdninstagram.com/t51.2885-15/e15/14287934_1389514537744146_673363238_n.jpg?ig_cache_key=MTMzNzM3MzgwNjAyNDY5NDAzMA%3D%3D.2">here as well';
if (preg_match($regex, $string, $match)) {
echo $match[1];
}
?>
Please show your whole piece of code with the use of strip_tags.
You can try: preg_replace('~.*(<img[^>]+>)~', '$1', $myvariable);

Finding and replacing attributes using preg_replace

I am trying to redo some forms that have uppercase field names and spaces, there are hundreds of fields and 50 + forms... I decided to try to write a PHP script that parses through the HTML of the form.
So now I have a textarea that I will post the html into and I want to change all the field names from
name="Here is a form field name"
to
name="here_is_a_form_field_name"
How in one command could I parse through and change it so all in the name tags would be lowercase and spaces replace with underscores
I am assuming preg_replace with an expression?
Thanks!
I would suggest not using regex for manipulation of HTML .. I would use DOMDocument instead, something like the following
$dom = new DOMDocument();
$dom->loadHTMLFile('filename.html');
// loop each textarea
foreach ($dom->getElementsByTagName('textarea') as $item) {
// setup new values ie lowercase and replacing space with underscore
$newval = $item->getAttribute('name');
$newval = str_replace(' ','_',$newval);
$newval = strtolower($newval);
// change attribute
$item->setAttribute('name', $newval);
}
// save the document
$dom->saveHTML();
An alternative would be to use something like Simple HTML DOM Parser for the job - there are some good examples on the linked site
I agree that preg_replace() or rather preg_replace_callback() is the right tool for the job, here's an example of how to use it for your task:
preg_replace_callback('/ name="[^"]"/', function ($matches) {
return str_replace(' ', '_', strtolower($matches[0]))
}, $file_contents);
You should, however, check the results afterwards using a diff tool and fine-tune the pattern if necessary.
The reason why I would recommend against a DOM parser is that they usually choke on invalid HTML or files that contain for example tags for templating engines.
This is your Solution:
<?php
$nameStr = "Here is a form field name";
while (strpos($nameStr, ' ') !== FALSE) {
$nameStr = str_replace(' ', '_', $nameStr);
}
echo $nameStr;
?>

preg_replace only OUTSIDE tags ? (... we're not talking full 'html parsing', just a bit of markdown)

What is the easiest way of applying highlighting of some text excluding text within OCCASIONAL tags "<...>"?
CLARIFICATION: I want the existing tags PRESERVED!
$t =
preg_replace(
"/(markdown)/",
"<strong>$1</strong>",
"This is essentially plain text apart from a few html tags generated with some
simplified markdown rules: <a href=markdown.html>[see here]</a>");
Which should display as:
"This is essentially plain text apart from a few html tags generated with some simplified markdown rules: see here"
... BUT NOT MESS UP the text inside the anchor tag (i.e. <a href=markdown.html> ).
I've heard the arguments of not parsing html with regular expressions, but here we're talking essentially about plain text except for minimal parsing of some markdown code.
Actually, this seems to work ok:
<?php
$item="markdown";
$t="This is essentially plain text apart from a few html tags generated
with some simplified markdown rules: <a href=markdown.html>[see here]</a>";
//_____1. apply emphasis_____
$t = preg_replace("|($item)|","<strong>$1</strong>",$t);
// "This is essentially plain text apart from a few html tags generated
// with some simplified <strong>markdown</strong> rules: <a href=
// <strong>markdown</strong>.html>[see here]</a>"
//_____2. remove emphasis if WITHIN opening and closing tag____
$t = preg_replace("|(<[^>]+?)(<strong>($item)</strong>)([^<]+?>)|","$1$3$4",$t);
// this preserves the text before ($1), after ($4)
// and inside <strong>..</strong> ($2), but without the tags ($3)
// "This is essentially plain text apart from a few html tags generated
// with some simplified <strong>markdown</strong> rules: <a href=markdown.html>
// [see here]</a>"
?>
A string like $item="odd|string" would cause some problems, but I won't be using that kind of string anyway... (probably needs htmlentities(...) or the like...)
You could split the string into tag‍/‍no-tag parts using preg_split:
$parts = preg_split('/(<(?:[^"\'>]|"[^"<]*"|\'[^\'<]*\')*>)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
Then you can iterate the parts while skipping every even part (i.e. the tag parts) and apply your replacement on it:
for ($i=0, $n=count($parts); $i<$n; $i+=2) {
$parts[$i] = preg_replace("/(markdown)/", "<strong>$1</strong>", $parts[$i]);
}
At the end put everything back together with implode:
$str = implode('', $parts);
But note that this is really not the best solution. You should better use a proper HTML parser like PHP’s DOM library. See for example these related questions:
Highlight keywords in a paragraph
Regex / DOMDocument - match and replace text not in a link
First replace any string after a tag, but force your string is after a tag:
$t=preg_replace("|(>[^<]*)(markdown)|i",'$1<strong>$2</strong>',"<null>$t");
Then delete your forced tag:
$show=preg_replace("|<null>|",'',$show);
You could split your string into an array at every '<' or '>' using preg_split(), then loop through that array and replace only in entries not beginning with an '>'. Afterwards you combine your array to an string using implode().
This regex should strip all HTML opening and closing tags: /(<[.*?]>)+/
You can use it with preg_replace like this:
$test = "Hello <strong>World!</strong>";
$regex = "/(<.*?>)+/";
$result = preg_replace($regex,"",$test);
actually this is not very efficient, but it worked for me
$your_string = '...';
$search = 'markdown';
$left = '<strong>';
$right = '</strong>';
$left_Q = preg_quote($left, '#');
$right_Q = preg_quote($right, '#');
$search_Q = preg_quote($search, '#');
while(preg_match('#(>|^)[^<]*(?<!'.$left_Q.')'.$search_Q.'(?!'.$right_Q.')[^>]*(<|$)#isU', $your_string))
$your_string = preg_replace('#(^[^<]*|>[^<]*)(?<!'.$left_Q.')('.$search_Q.')(?!'.$right_Q.')([^>]*<|[^>]*$)#isU', '${1}'.$left.'${2}'.$right.'${3}', $your_string);
echo $your_string;

Regex Replace with Backreference modified by functions

I want to replace the class with the div text like this :
This: <div class="grid-flags" >FOO</div>
Becomes: <div class="iconFoo" ></div>
So the class is changed to "icon". ucfirst(strtolower(FOO)) and the text is removed
Test HTML
<div class="grid-flags" >FOO</div>
Pattern
'/class=\"grid-flags\" \>(FOO|BAR|BAZ)/e'
Replacement
'class="icon'.ucfirst(strtolower($1).'"'
This is one example of a replacement string I've tried out of seemingly hundreds. I read that the /e modifier evaluates the PHP code but I don't understand how it works in my case because I need the double quotes around the class name so I'm lost as to which way to do this.
I tried variations on the backref eg. strtolower('$1'), strtolower('\1'), strtolower('{$1}')
I've tried single and double quotes and various escaping etc and nothing has worked yet.
I even tried preg_replace_callback() with no luck
function callback($matches){
return 'class="icon"'.ucfirst(strtolower($matches[0])).'"';
}
It was difficult for me to try to work out what you meant, but I think you want something like this:
preg_replace('/class="grid-flags" \>(FOO|BAR|BAZ)/e',
'\'class="icon\'.ucfirst(strtolower("$1")).\'">\'',
$text);
Output for your example input:
<div class="iconFoo"></div>
If this isn't what you want, could you please give us some example inputs and outputs?
And I have to agree that this would be easier with an HTML parser.
Instead of using the e(valuate) option you can use preg_replace_callback().
$text = '<div class="grid-flags" >FOO</div>';
$pattern = '/class="grid-flags" >(FOO|BAR|BAZ)/';
$myCB = function($cap) {
return 'class="icon'.ucfirst($cap[1]).'" >';
};
echo preg_replace_callback($pattern, $myCB, $text);
But instead of using regular expressions you might want to consider a more suitable parser for html like simple_html_dom or php's DOM extension.
This works for me
$html = '<div class="grid-flags" >FOO</div>';
echo preg_replace_callback(
'/class *= *\"grid-flags\" *\>(FOO|BAR|BAZ)/'
, create_function( '$matches', 'return \'class="icon\' . ucfirst(strtolower($matches[1])) .\'">\'.$matches[1];' )
, $html
);
Just be aware of the problems of parsing HTML with regex.

Categories