I have this preg_split function with the pattern to search for any <br>
However, I would like to add some more pattern to it besides <br>.
How can I do that with the current line of code below?
preg_split('/<br[^>]*>/i', $string, 25);
Thanks in advance!
i cant comment thats why im putting an answer,
tell me what you need to be implemented \,
or use a website like PHP live regex creator
PHPs preg_split() function only accepts a single pattern argument, not multiple. So you have to use the power of regular expressions to match your delimiters.
This would be an example:
preg_split('/(<br[^>]*>)|(<p[^>]*>)/i', $string, 25);
If matches on html line breaks and/or paragraph tags.
It is helpful to use a regex tool to test ones expressions. Either a local one or a web based service like https://regex101.com/
The above slips the example text
this is a <br> text
with line breaks <br /> and
stuff like <p>, correct?
like that:
Array
(
[0] => this is a
[1] => text
with line breaks
[2] => and
stuff like
[3] => , correct?
)
Note however that for parsing html markup a DOM parser probably is the better alternative. You don't risk to stumble over escaped characters and the like...
Related
I'm going to be working with a long string of data that is serialized into blocks using a pattern (x:y).
However, I struggle with regular expressions, and are looking for resources to help identify how to construct a regex to identify any/all of these blocks as they appear in a string.
For example, given the following string:
$s = 't:user c:red t:admin n:"bob doe" s:expressionsf:json';
Note: the f:json at the end is missing a space on purpose, because the format might vary with how the string is eventually given to me. Each block might be spaced, and they might not.
How would I identify each block of x:y to end with the below result:
Array
(
[0] => t:user
[1] => c:red
[2] => t:admin
[3] => n:"bob doe"
[4] => s:expression
[5] => f:json
)
I've tested various expressions using my limited knowledge, but have not been terribly successful.
I can successfully match the pattern using something like this:
^[ctrns]:.+
But this unfourtunately matches the entire string. The part I seem to be missing is how to break each block, while also maintaining the ability to keep spaces within the pairs (see n:"bob doe" example).
Any assistance would be super appreciated! Also, ideally any submission would be explained as to what each token in the expression was accomplishing so that I better my understanding of these techniques.
I've been using https://regexr.com/ to practice.
You may use this regex in preg_match_all:
[ctnsf]:(?:"[^"\\]*(?:\\.[^"\\]*)*"|\S+?(?=[ctnsf]:|\s|$))
RegEx Demo
RegEx Details:
[ctnsf]:: Match one of ctnsf characters followed by :
(?:"[^"\\]*(?:\\.[^"\\]*)*": Match a quoted substring. This takes care of escaped quotes as well.
|: OR
\S+?: Match 1+ not-whitespace characters (non-greedy)
(?=[ctnsf]:|\s|$): Positive lookahead to assert one of the conditions given in assertions.
Code:
$re = '/[ctnsf]:(?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\S+?(?=[ctnsf]:|\s|$))/m';
$str = 't:user c:red t:admin n:"bob \\"doe" s:expressionsf:json';
preg_match_all($re, $str, $matches);
// Print the entire match result
print_r($matches[0]);
Code Demo
I want to build a pattern to exchange integer numbers which are the only thing on the line
If there is a word on the same line as the integer I do not want to change anything.
I tried $pattern = '/(.)[0-9][0-9](.)/';
but this doesn't work well for me
and when I try for example $pattern = '/1(.)2(.)3(.)/'; it will replace the only single numbers which I will put in pattern
I want to replace example subject of this
subject = "
1
2
3
4
5
6
7
8
9
10
"
the numbers must be integer not decimal and amouth of them are random but there musn't be any text on the same line as the number?
Any ideas?
Regexes are not a magic wand that are the answer to every programming problem.
In this case, sounds like you want to actually be using explode() to break apart your subject on \n, then manipulate the lines as an array, then recreating the subject with implode(). It's much easier to deal with two lines when they are $lines[$x] and $lines[$y].
Further, if the lines are coming from HTML, then you don't want to be using regular expressions to parse the HTML. Instead, you want to use the PHP DOM module. http://php.net/manual/en/book.dom.php
What you need is multi-line mode m. If you use it ^ matches at the beginning and $ matches at the end of each line. Then use \d+ for an arbitrarily long integer number:
/^[ \t]*\d+[ \t]*$/m
The [ \t]* allow an arbitrary number of spaces and tabs in addition to the number. Note that this will remove the whitespace along with the number. If you want to keep the whitespace, use
/^([ \t]*)\d+([ \t]*)$/m
And change your replacement string to
$1yourReplacementString$2
EDIT:
I realize now that you don't have line breaks, as in \r or \n or \r\n but <br> tags. That makes it a bit more difficult. Something like this should cover most cases:
~((?:^|<br[ ]*/?>)[ \t]*)\d+([ \t]*(?:$|<br\b))~
Again, you need to add $1 and $2 around your replacement string to not remove the <br> tags.
But as Andy said, HTML should not be dealt with using regular expressions. Either use the built-in DOM module provided by PHP, or some 3rd party library like this one.
This is how far I got.
This is working:
$urls = $this->match_all('/<a href="(http:\/\/www.imdb.de\/title\/tt.*?)".*?>.*?<\/a>/ms',
$content, 1);
Now I wan't to do the same with a different site.
But the link of the site has different structure:
http://www.example.org/ANYTHING
I don't know what I am doing wrong but with this other site (example.org) it is not working.
Here is what I have tried
$urls = $this->match_all('/<a href="(http:\/\/www.example.org\/.*?)".*?>.*?<\/a>/ms',
$content, 1);
Thank you for your help. Stackoverflow is so awesome!
ANYTHING is usually represented by .*? (which you already use in your original regex). You could also use [^"]+ as placeholder in your case.
It sounds like you want the following regular expression:
'/<a href="(http:\/\/example\.org\/.*?)".*?>.*?<\/a>/ms'
You can also use a different delimiter to avoid escaping the backslashes:
'#<a href="(http://example\.org/.*?)".*?>.*?</a>#ms'
Note the escaping of the . in the domain name, as you intend to match a literal ., not any character.
I think this should help
/<a href="(http:\/\/www.example.org\/.*?)".*?>.*?<\/a>/ms
text
Result:
Array
(
[0] => text
[1] => http://www.example.org/ANYTHING
)
EDIT: I always find this site very useful for when i want to try out preg_match - http://www.solmetra.com/scripts/regex/index.php
I have the following regular expression:
"^[x]{1}[a-z]{3,4}:[a-z0-9]{1,6}"
I want to use it to be able to match strings like:
xabc:z123
However, when I try it with this regex tester, it does not match the pattern. Is it my pattern that is wrong, or is the online tester unreliable?.
If my pattern is wrong, could someone point out why it is wrong.
Also, I want to make the pattern matching case insensitive - but I'm not too sure the best way to do that (thought better to ask rather than trial and error). How do I change the pattern so it matches irrespective of case?
Just add an i for case insensitive matching:
/^[x]{1}[a-z]{3,4}:[a-z0-9]{1,6}/i
By the way, your regular expression works!?
Output:
Array
(
[0] => xabc:z123
)
If you want to have something like:
Array
(
[0] => 'xabc:z123',
[1] => 'x',
[2] => 'abc'
...
)
You need to add groups using (), e.g.:
/^([x]{1})([a-z]{3,4}):([a-z0-9]{1,6})/i
In the tester, you have to enter the regex without the surrounding quotes. In PHP source code, you have to use quotes and a regex delimiter; the tester shows that in the code it generates:
$ptn = "/^[x]{1}[a-z]{3,4}:[a-z0-9]{1,6}/";
To make it case insensitive, you have two options. One is to add an i after the closing delimiter, as #middus's answer demonstrates. The other is to add (?i) to the the regex itself:
(?i)^[x]{1}[a-z]{3,4}:[a-z0-9]{1,6}
The tester will accept it either way; if you don't add the delimiters yourself it adds / to either end, which means any slashes in your regex need to be escaped (i.e., it doesn't escape them for you). Be aware that PHP allows you to use other characters as the delimiters, but that tester only recognizes /.
Some further notes:
To match a single x, all you need is x. The square brackets are unnecessary when there's only one letter inside them, and the {1} quantifier never has any effect--it's pure clutter.
If you're using the regex to validate the string, you may want to add a $ anchor to the end.
End result:
/^x[a-z]{3,4}:[a-z0-9]{1,6}$/i
Here is another tester that lets you choose your own delimiters, among other things.
I'm just trying my hand at crafting my very first regex. I want to be able to match a pseudo HTML element and extract useful information such as tag name, attributes etc.:
$string = '<testtag alpha="value" beta="xyz" gamma="abc" >';
if (preg_match('/<(\w+?)(\s\w+?\s*=\s*".*?")+\s*>/', $string, $matches)) {
print_r($matches);
}
Except, I'm getting:
Array ( [0] => [1] => testtag [2] => gamma="abc" )
Anyone know how I can get the other attributes? What am I missing?
Try this regular expression:
/<(\w+)((?:\s+\w+\s*=\s*(?:"[^"]*"|'[^']*'|[^'">\s]*))*)\s*>/
But you really shouldn’t use regular expressions for a context free language like HTML. Use a real parser instead.
As has been said, don't use RegEx for parsing HTML documents.
Try this PHP parser instead: http://simplehtmldom.sourceforge.net/
Your second capturing group matches the attributes one at a time, each time overwriting the previous one. If you were using .NET regexes, you could use the Captures array to retrieve the individual captures, but I don't know of any other regex flavor that has that feature. Usually you have to do something like capture all of the attributes in one group, then use another regex on the captured text to break out the individual attributes.
This is why people tend to either love regexes or hate them (or both). You can do some truly amazing things with them, but you also keep running into simple tasks like this one that are ridiculously hard, if not impossible.