Regex to find and expression and insert into the middle - php

I'll be brief as I can. I'm trying to use preg_replace's regex to find a digit, but I want to non destructively edit the string.
an example: (albeit this is an approximation due to data protection)
$subject_string = 'section 1.1: Disability ........' ;
$outcome = preg_replace( '/$section[\d.\d]+/' , '\<hr/\>' , $subject_string );
// $outcome will be: "\<hr/\>section 1.1: Disability ........"
Any help would be gratefully received

Use
\bsection\s*\d+(?:\.\d+)*:
Replace with <hr/>$0. See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
section 'section'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
: ':'
Code snippet:
$re = '/\bsection\s*\d+(?:\.\d+)*:/';
$str = 'section 1.1: Disability ........';
$subst = '<hr/>$0';
$result = preg_replace($re, $subst, $str);
echo "The result of the substitution is ".$result;

Related

Match everything in brackets after a specific character

I have the following string:
$text = 'These are my cards. They are {{Archetype|Agumon}} and {{Fire|Gabumon}}'
I'm trying to replace all instances of occurrences like {{Archetype|Agumon}} into [Agumon].
I've been struggling to get my head around it and have come up with this so far:
$string = preg_replace('#\{\{(.*?)\}\}#', '[$1]', $text);
This results in:
These are my cards. They are [Archetype|Agumon] and [Fire|Gabumon]
So I am currently matching the full text found in between the double curly brackets.
I thought it would be something like this: \|(.*?) to get the match after the | character in the curly brackets but to no avail.
You may use:
\{\{[^}]*\|([^}]*)\}\}
Demo.
Breakdown:
\{\{ - Match "{{" literally.
[^}]* - Greedily match zero or more characters other than '}'.
\| - Match a pipe character.
([^}]*) - Match zero or more characters other than '}' and capture them in group 1.
\}\} - Match "}}" literally.
Use
preg_replace('/{{(?:(?!{|}})[^|]*\|(.*?))}}/s', '[$1]', $text)
See proof. It will support { and } in the part before the pipe.
Explanation
--------------------------------------------------------------------------------
{{ '{{'
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
{ '{'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
}} '}}'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character (0 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
}} '}}'
PHP code:
$text = 'These are my cards. They are {{Archetype|Agumon}} and {{Fire|Gabumon}}';
echo preg_replace('/{{(?:(?!{|}})[^|]*\|(.*?))}}/s', '[$1]', $text);
Results: These are my cards. They are [Agumon] and [Gabumon]

PHP preg_replace remove specific parts from string

im having problems with understanding regex in PHP. I have img src:
src="http://example.com/javascript:gallery('/info/2005/image.jpg',383,550)"
and need to build from it this:
src="http://example.com/info/2005/image.jpg"
How it it possible to cut first and last part from string to obtain clear link without javascript part?
Right now im using this regex:
$cont = 'src="http://example.com/javascript:gallery('/info/2005/image.jpg',383,550)"'
$cont = preg_replace("/(src=\")(.*)(\/info)/","$1http://example.com$3", $cont);
and output is:
src="http://example.com/info/2005/image.jpg',383,550)"
As an alternative solution, you might also capture the src="http://example.com part by matching the protocol in group 1, so you can use it in the replacement.
(src="https?://[^/]+)/[^']*'(/info[^']*)'[^"]*
Explanation
(src="https?://[^/]+)/ Capture group 1, match src="http, optional s, :// and till the first /
[^']*' Match any char except ', then match '
(/info[^']*) Capture group 2, match /info followed by any char except '
'[^"]* Match the ' followed by matching any char except "
Regex demo | Php demo
$cont = 'src="http://example.com/javascript:gallery(\'/info/2005/image.jpg\',383,550)"';
$cont = preg_replace("~(src=\"https?://[^/]+)/[^']*'(/info[^']*)'[^\"]*~", '$1$2', $cont);
echo $cont;
Output
src="http://example.com/info/2005/image.jpg"
Use
preg_replace("/src=\"\K.*(\/info[^']*)'[^\"]*/", 'http://example.com$1', $cont)
See regex proof.
Explanation
--------------------------------------------------------------------------------
src= 'src='
--------------------------------------------------------------------------------
\" '"'
--------------------------------------------------------------------------------
\K match reset operator
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
info 'info'
--------------------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
[^\"]* any character except: '\"' (0 or more
times (matching the most amount possible))

PHP preg_split on spaces, but not within tags

i am using preg_split("/\"[^\"]*\"(*SKIP)(*F)|\x20/", $input_line); and run it on phpliveregex.com
it produce array :
array(10
0=><b>test</b>
1=>or
2=><em>oh
3=>yeah</em>
4=>and
5=><i>
6=>oh
7=>yeah
8=></i>
9=>"ye we 'hold' it"
)
NOT what i want, it should be seperate by spaces only outside html tags like this:
array(5
0=><b>test</b>
1=>or
2=><em>oh yeah</em>
3=>and
4=><i>oh yeah</i>
5=>"ye we 'hold' it"
)
in this regex i am only can add exception in "double quote" but realy need help to add more, like tag <img/><a></a><pre></pre><code></code><strong></strong><b></b><em></em><i></i>
any explanation about how that regex works also appreciate.
It's easier to use the DOMDocument since you don't need to describe what a html tag is and how it looks. You only need to check the nodeType. When it's a textNode, split it with preg_match_all (it's more handy than to design a pattern for preg_split):
$html = 'spaces in a text node <b>test</b> or <em>oh yeah</em> and <i>oh yeah</i>
"ye we \'hold\' it"
"unclosed double quotes at the end';
$dom = new DOMDocument;
$dom->loadHTML('<div>' . $html . '</div>', LIBXML_HTML_NOIMPLIED);
$nodeList = $dom->documentElement->childNodes;
$results = [];
foreach ($nodeList as $childNode) {
if ($childNode->nodeType == XML_TEXT_NODE &&
preg_match_all('~[^\s"]+|"[^"]*"?~', $childNode->nodeValue, $m))
$results = array_merge($results, $m[0]);
else
$results[] = $dom->saveHTML($childNode);
}
print_r($results);
Note: I have chosen a default behaviour when a double quote part stays unclosed (without a closing quote), feel free to change it.
Note2: Sometimes LIBXML_ constants are not defined. You can solve this problem testing it before and defining it when needed:
if (!defined('LIBXML_HTML_NOIMPLIED'))
define('LIBXML_HTML_NOIMPLIED', 8192);
Description
Instead of using a split command just match the sections you want
<(?:(?:img)(?=[\s>\/])(?:[^>=]|=(?:'[^']*'|"[^"]*"|[^'"\s>]*))*\s?\/?>|(a|span|pre|code|strong|b|em|i)(?=[\s>\\])(?:[^>=]|=(?:'[^']*'|"[^"]*"|[^'"\s>]*))*\s?\/?>.*?<\/\1>)|(?:"[^"]*"|[^"<]*)*
Example
Live Demo
https://regex101.com/r/bK8iL3/1
Sample text
Note the difficult edge case in the second paragraph
<b>test</b> or <strong> this </strong><em> oh yeah </em> and <i>oh yeah</i> Here we are "ye we 'hold' it"
some<img/>gfsf<a html="droids.html" onmouseover=' var x=" Not the droid I am looking for " ; '>droides</a><pre></pre><code></code><strong></strong><b></b><em></em><i></i>
Sample Matches
MATCH 1
0. [0-11] `<b>test</b>`
MATCH 2
0. [11-15] ` or `
MATCH 3
0. [15-38] `<strong> this </strong>`
MATCH 4
0. [38-56] `<em> oh yeah </em>`
MATCH 5
0. [56-61] ` and `
MATCH 6
0. [61-75] `<i>oh yeah</i>`
MATCH 7
0. [75-111] ` Here we are "ye we 'hold' it" some`
MATCH 8
0. [111-117] `<img/>`
MATCH 9
0. [117-121] `gfsf`
MATCH 10
0. [121-213] `<a html="droids.html" onmouseover=' var x=" Not the droid I am looking for " ; '>droides</a>`
MATCH 11
0. [213-224] `<pre></pre>`
MATCH 12
0. [224-237] `<code></code>`
MATCH 13
0. [237-254] `<strong></strong>`
MATCH 14
0. [254-261] `<b></b>`
MATCH 15
0. [261-270] `<em></em>`
MATCH 16
0. [270-277] `<i></i>`
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
< '<'
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
img 'img'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
[\s>\/] any character of: whitespace (\n, \r,
\t, \f, and " "), '>', '\/'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
[^>=] any character except: '>', '='
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^'"\s>]* any character except: ''', '"',
whitespace (\n, \r, \t, \f, and "
"), '>' (0 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
\/? '/' (optional (matching the most amount
possible))
----------------------------------------------------------------------
> '>'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
a 'a'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
span 'span'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
pre 'pre'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
code 'code'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
strong 'strong'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
b 'b'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
em 'em'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
i 'i'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
[\s>\\] any character of: whitespace (\n, \r,
\t, \f, and " "), '>', '\\'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
[^>=] any character except: '>', '='
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^'"\s>]* any character except: ''', '"',
whitespace (\n, \r, \t, \f, and "
"), '>' (0 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
\/? '/' (optional (matching the most amount
possible))
----------------------------------------------------------------------
> '>'
----------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
----------------------------------------------------------------------
< '<'
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
> '>'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^"<]* any character except: '"', '<' (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------

PHP preg_replace find match in html but not if its a html attribute

I have two regex one which matches [value] and another which matches html attributes but i need to combine them into a single regex.
This is the regex I'm working with to find [value]
$tagregexp = '[a-zA-Z_\-][0-9a-zA-Z_\-\+]{2,}';
$pattern =
'\\[' // Opening bracket
. '(\\[?)' // 1: Optional second opening bracket for escaping shortcodes: [[tag]]
. "($tagregexp)" // 2: Shortcode name
. '(?![\\w-])' // Not followed by word character or hyphen
. '(' // 3: Unroll the loop: Inside the opening shortcode tag
. '[^\\]\\/]*' // Not a closing bracket or forward slash
. '(?:'
. '\\/(?!\\])' // A forward slash not followed by a closing bracket
. '[^\\]\\/]*' // Not a closing bracket or forward slash
. ')*?'
. ')'
. '(?:'
. '(\\/)' // 4: Self closing tag ...
. '\\]' // ... and closing bracket
. '|'
. '\\]' // Closing bracket
. '(?:'
. '(' // 5: Unroll the loop: Optionally, anything between the opening and closing shortcode tags
. '[^\\[]*+' // Not an opening bracket
. '(?:'
. '\\[(?!\\/\\2\\])' // An opening bracket not followed by the closing shortcode tag
. '[^\\[]*+' // Not an opening bracket
. ')*+'
. ')'
. '\\[\\/\\2\\]' // Closing shortcode tag
. ')?'
. ')'
. '(\\]?)'; // 6: Optional second closing bracket for escaping shortcodes: [[tag]]
example here
This regex (\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']? matches an attribute and a value.
example here
I would like the regex to match [value] in the following examples
<div [value] ></div>
<div>[value]</div>
but not find a match in this example
<input attr="attribute[value]"/>
Just need to make it into a single regex to use in my preg_replace_callback
preg_replace_callback($pattern, replace_matches, $html);
Foreward
On the surface it looks like you're attempting to parse html code with a regular expression. I feel obligated to point out that it's not advisable to use a regex to parse HTML due to all the possible obscure edge cases that can crop up, but it seems that you have some control over the HTML so you should able to avoid many of the edge cases the regex police cry about.
Description
<\w+\s(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\[(?<DesiredValue>[^\]]*)\])
|
<\w+\s?(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>
(?:(?!<\/div>)(?!\[).)*\[(?<DesiredValue>[^\]]*)\]
This regular expression will do the following:
capture the substring inside square brackets [some value]
were [value] is in the attributes of a tag
were [value] is in not inside the attributes area of a tag
providing the substring is not nested inside another value <input attrib=" [value] ">
the captured substring will not include the wrapping square brackets
allow any tag name, or replace the \w with the desired tag names
allow value to be any string of characters
avoid difficult edge cases
Note: this regex is best used with the following flags:
global
dot matches new line
ignore white space in expression
allow duplicate named capture groups
Examples
Live Demo
https://regex101.com/r/tT0bN5/1
Sample Text
<div [value 1] ></div>
<div>[value 2]</div>
but not find a match in this example
<div attr="attribute[value 3]"/>
<img [value 4]>
[value 6]
Sample Matches
MATCH 1
DesiredValue [6-13] `value 1`
MATCH 2
DesiredValue [29-36] `value 2`
MATCH 3
DesiredValue [121-128] `value 4`
MATCH 4
DesiredValue [159-166] `value 6`
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
<div '<div'
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
----------------------------------------------------------------------
[^>=] any character except: '>', '='
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=' '=\''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=" '="'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
[^'"] any character except: ''', '"'
----------------------------------------------------------------------
[^\s>]* any character except: whitespace (\n,
\r, \t, \f, and " "), '>' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^\]]* any character except: '\]' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
<div '<div'
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
[^>=] any character except: '>', '='
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=' '=\''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=" '="'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
[^'"] any character except: ''', '"'
----------------------------------------------------------------------
[^\s>]* any character except: whitespace (\n,
\r, \t, \f, and " "), '>' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
> '>'
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
< '<'
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
div> 'div>'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
[^\]]* any character except: '\]' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\] ']'

Notepad++ deleting tags with specific text inside

I have a large XML file with products inside. I'm trying to delete all products which are out of stock. File size is over 20MB.
<product>
<name>bla1</name>
<price>50$</price>
<stock>yes</stock>
<description>bla</description>
</product>
<product>
<name>bla2</name>
<price>60$</price>
<stock>no</stock>
<description>bla</description>
</product>
...
Is it possible to delete them using Notepad++'s regex or should I use simpleXML(PHP) or something similar?
My basic PHP code:
$url = 'input/products.xml';
$xml = new SimpleXMLElement(file_get_contents($url));
foreach ($xml->product->children() as $product) {
//finding out of stock products and deleting them
}
$xml->asXml('output/products.xml');
Forward
Doing pattern matching via regular expression is not ideal, if you have access to PHP, then I recommend using a proper HTLM parsing tool. With that said, I offer a solution you can use in Notepad++
Description
<product\s*(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s?\/?>(?:(?!</product).)*<stock\s*(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s?\/?>no</stock>(?:(?!</product).)*<\/product>
Replace with: nothing
To view the image better, you can right click it and select view in new window.
This Regular Expression will do the following:
find the entire product section
require the subtag stock
require the subtag stock to have a value of no
avoid extremely edge cases that makes pattern matching in HTML difficult
From Notepad ++
From Notepad++, note that you should be using notpad++ version 6.1 or later as there were problems with regular expressions in an older version that have been solved now.
press the ctrlh to enter the find and replace
mode
Select the Regular Expression option
In the "Find what" field place the regular expression
in the "Replace with" field enter ``
Click Replace all
Example
Live Demo
https://regex101.com/r/cW9nC5/1
Sample text
<product>
<name>bla1</name>
<price>50$</price>
<stock>yes</stock>
<description>bla</description>
</product>
<product>
<name>bla2</name>
<price>60$</price>
<stock>no</stock>
<description>bla</description>
</product>
After Replace
<product>
<name>bla1</name>
<price>50$</price>
<stock>yes</stock>
<description>bla</description>
</product>
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
<product '<product'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the least amount possible)):
----------------------------------------------------------------------
[^>=] any character except: '>', '='
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=' '=\''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=" '="'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
[^'"] any character except: ''', '"'
----------------------------------------------------------------------
[^\s>]* any character except: whitespace (\n,
\r, \t, \f, and " "), '>' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
\/? '/' (optional (matching the most amount
possible))
----------------------------------------------------------------------
> '>\r\n'
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
</product '</product'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
<stock '<stock'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the least amount possible)):
----------------------------------------------------------------------
[^>=] any character except: '>', '='
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=' '=\''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=" '="'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
[^'"] any character except: ''', '"'
----------------------------------------------------------------------
[^\s>]* any character except: whitespace (\n,
\r, \t, \f, and " "), '>' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
\/? '/' (optional (matching the most amount
possible))
----------------------------------------------------------------------
>no</stock> '>no</stock>'
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
</product '</product'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
< '<'
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
product> 'product>'
----------------------------------------------------------------------
I guess notepad++ will be easier, i.e.:
FIND : <product>\s+<name>.*?<\/name>\s+<price>.*?<\/price>\s+<stock>no<\/stock>\s+<description>.*?\/description>\s+<\/product>
REPLACE : with nothing
DEMO
https://regex101.com/r/fH0mM7/1
NOTE
Make sure you checkRegular Expression at the bottom
You can do this with PHP using the below code
<?php
$url = 'input/products.xml';
$xml = new SimpleXMLElement(file_get_contents($url));
$i = count($xml) - 1;
for ($i; $i >= 0; --$i) {
$product = $xml->product[$i];
if ($product->stock == "no") {
unset($xml->product[$i]);
}
}
$xml->asXml('output/products.xml');
?>

Categories