I have a string:
$str="(94896)content is here(/94896)(94897)content is here(/94897)(94898)content is here(/94898)(94899)content is here(/94899)";
the (number) and (/number) act as tags to take certain content out of the string.
and I have a preg_match to take the content out:
if(preg_match('/(94896)\"(.*)\"(\/94896)/',$str,$c)) {echo "I found the content, its:".$co[1];}
Now for some reason, it doesn't find a match in the string ($str), though its clearly there....
Any ideas on what im doing wrong here?
You need to take the double-quotes out of your regex string, since they don't appear in $str, but are expected by the regex.
'/(94896)\"(.*)\"(\/94896)/'
// ^^ ^^
// These aren't in the string.
EDIT: I think you'll also need to escape your brackets, since they will be getting read as grouping operators, not actual brackets.
Your expression should be:
'/\(94896\)(.*)\(\/94896\)/'
Parentheses are used in a regex to denote subpatterns. If you want to search these characters in a string, you must escape them:
preg_match('/\(94896\)(.*)\(\/94896\)/',$str,$c)
If the pattern is found:
echo "I found the content, its:".$c[0];
Oh, and as Karl Nicoll says, why are the quotations in your pattern?
To match all content:
$str="(94896)content is here(/94896)(94897)content is here(/94897)(94898)content is here(/94898)(94899)content is here(/94899)";
$re = '/\((\d+)\)(.*)\(\/\1\)/';
preg_match_all($re, $str, $matches,PREG_SET_ORDER);
var_dump($matches);
Number will be in $matches[*][1], content in $matches[*][2].
Related
I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)
Using preg_match_all(), I want to match something like:
...randomtext...>MATCH1</a>" (MATCH2)"...randomtext... EDIT: to clarify, this is exactly the string I'm trying to extract data from, including the brackets, quotes, angle-brackets etc.
Here's what I've tried: preg_match_all("/^>(.+?)</a>\" \((.+?)\)\"$/", $htmlfile, $matches);
It should extract MATCH1 as $matches[1][0] and MATCH2 as $matches[2][0]
Any idea why it isn't working?
Thanks
You didn't escape your end tag </a>
This should work:
preg_match_all("/>(.+?)<\/a>\" \((.*?)\)/", $htmlfile, $matches);
See Codepad example.
You need to escape the / in your pattern, and you don't want your pattern anchored to ^ and $
So probably this will work: preg_match_all("/>(.+?)<\/a>\" \((.+?)\)\"/", $htmlfile, $matches);
This is the text sample:
$text = "asd dasjfd fdsfsd http://11111.com/asdasd/?s=423%423%2F gfsdf http://22222.com/asdasd/?s=423%423%2F
asdfggasd http://3333333.com/asdasd/?s=423%423%2F";
This is my regex pattern:
preg_match_all( "#http:\/\/(.*?)[\s|\n]#is", $text, $m );
That match the first two urls, but how do I match the last one? I tried adding [\s|\n|$] but that will also only match the first two urls.
Don't try to match \n (there's no line break after all!) and instead use $ (which will match to the end of the string).
Edit:
I'd love to hear why my initial idea doesn't work, so in case you know it, let me know. I'd guess because [] tries to match one character, while end of line isn't one? :)
This one will work:
preg_match_all('#http://(\S+)#is', $text, $m);
Note that you don't have to escape the / due to them not being the delimiting character, but you'd have to escape the \ as you're using double quotes (so the string is parsed). Instead I used single quotes for this.
I'm not familar with PHP, so I don't have the exact syntax, but maybe this will give you something to try. the [] means a character class so |$ will literally look for a $. I think what you'll need is another look ahead so something like this:
#http:\/\/(.*)(?=(\s|$))
I apologize if this is way off, but maybe it will give you another angle to try.
See What is the best regular expression to check if a string is a valid URL?
It has some very long regular expressions that will match all urls.
Hi Guys I'm very new to regex, can you help me with this.
I have a string like this "<input attribute='value' >" where attribute='value' could be anything and I want to get do a preg_replace to get just <input />
How do I specify a wildcard to replace any number of any characters in a srting?
like this? preg_replace("/<input.*>/",$replacement,$string);
Many thanks
What you have:
.*
will match "any character, and as many as possible.
what you mean is
[^>]+
which translates to "any character, thats not a ">", and there must be at least one
or altertaively,
.*?
which means
"any character, but only enough to make this rule work"
BUT DONT
Parsing HTML with regexps is Bad
use any of the existing html parsers, DOM librarys, anything, Just NOT NAïVE REGEX
For example:
<foo attr=">">
Will get grabbed wrongly by regex as
'<foo attr=" ' with following text of '">'
Which will lead you to this regex:
`<[a-zA-Z]+( [a-zA-Z]+=['"][^"']['"])*)> etc etc
at which point you'll discover this lovely gem:
<foo attr="'>\'\"">
and your head will explode.
( the syntax highlighter verifies my point, and incorrectly matches thinking i've ended the tag. )
Some people were close... but not 100%:
This:
preg_replace("<input[^>]*>", $replacement, $string);
should be this:
preg_replace("<input[^>]*?>", $replacement, $string);
You don't want that to be a greedy match.
preg_replace("<input[^>]*>", $replacement, $string);
// [^>] means "any character except the greater than symbol / right tag bracket"
This is really basic stuff, you should catch up with some reading. :-)
If I understand the question correctly, you have the code:
preg_replace("/<input.*>/",$replacement,$string);
and you want us to tell you what you should use for $replacement to delete what was matched by .*
You have to go about this the other way around. Use capturing groups to capture what you want to keep, and reinsert that into the replacement. E.g.:
preg_replace("/(<input).*(>)/","$1$2",$string);
Of course, you don't really need capturing groups here, as you're only reinserting literal text. Bet the above shows the technique, in case you want to do this in a situation where the tag can vary. This is a better solution:
preg_replace("/<input [^>]*>/","<input />",$string);
The negated character class is more specific than the dot. This regex will work if there are two HTML tags in the string. Your original regex won't.
I am using preg_replace() for some string replacement.
$str = "<aa>Let's find the stuff qwe in between <id>12345</id> these two previous brackets</h>";
$do = preg_match("/qwe(.*)12345/", $str, $matches);
which is working just fine and gives the following result
$match[0]=qwe in between 12345
$match[1]=in between
but I am using same logic to extract from the following string.
<text>
<src><![CDATA[<TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT FACE="Arial" SIZE="36" COLOR="#999999" LETTERSPACING="0" KERNING="0">r1 text 1 </FONT></P></TEXTFORMAT>]]></src>
<width>45%</width>
<height>12%</height>
<left>30.416666666666668%</left>
<top>3.0416666666666665%</top>
<begin>2s</begin>
<dur>10s</dur>
<transIn>fadeIn</transIn>
<transOut>fadeOut</transOut>
<id>E2159292994B083ACA7ABC7799BBEF3F7198FFA2</id>
</text>
I want to extract the string from
r1text1
to
</id>
The Regular expression I currently Have is:
preg_match('/r1text1(.*)</id\>/', $metadata], $matches);
where $metadata is the above string..
$matches does not return anything....
For some reason...how do i do it?
Thanks in advance
If you want to extract the text, you will probably want to use preg_match. The following might work:
preg_match('#\<P[^\>]*\>\<FONT[^\>]*\>(.*\</id\>)#', $string, $matches)
Whatever gets matched in the parantheses can be found later in the $matches array. In this case everything between a <P> tag followed by a <FONT> tag and </id>, including the latter.
Above regex is untested but might give you a general idea of how to do it. Adapt if your needs are a bit different :)
Even if don't know why you would match the regex on a incomplete XML fragment (starting within a <![CDATA[ and ending right before the closing XML tag </id>, you do have three obvious problems with your regex:
As Amri said: you have to escape the / character in the closing XML tag because you use / as the pattern delimiter. By the way, you don't have to escape the > character. That gives you: '/r1text1(.*)<\/id>/' Alternatively you can change the pattern delimiter to # for example: '#r1text1(.*)</id>#' (I will use the first pattern to further develop the expression).
As Rich Adams already said: the text in your example data is "r1_text_1" (_ is a space character) but you match against '/r1text1(.*)<\/id>/'. You have to include the spaces in your regex or allow for a uncertain number of spaces, such as '/r1(?:\s*)text(?:\s*)1(.*)<\/id>/' (the ?: is the syntax for non-capturing subpatterns)
The . (dot) in your regex does not match newlines by default. You have to add the s (PCRE_DOTALL) pattern modifier to let the . (dot) match against newlines as well: '/r1(?:\s*)text(?:\s*)1(.*)<\/id>/s'
you probably need to parse your string/file and extract the value between the FONT tag. Then insert the value into the id tag
Try googling for php parsing.
try this
preg_match('/r1text1(.*)<\/id\>/', $metadata], $matches);
You are using / as the pattern delimiter but your content has / in . You can use \ as the escape character.
In the sample you have "r1 text 1 ", yet your regular expression has "r1text1". The regular expression doesn't match because there are spaces in the string you are trying to match it against. You should include the spaces in the regular expression.