I am trying to find a link using regexp which appears just before textABCXYZ123 string in below HTML .
lorem ispum...<strong>FIRSTlink </strong><br>
1 points| Saved Jan 08, 2014 at 00:49 <span class=notes_box>ANOTHERLINK</span>.
... more text........... more text........
... more text.......<strong>other link </strong><br>
1 points| Saved Jan 08, 2014 at 00:49 <span class=notes_box>ANOTHERLINK</span>.
... more text........... more text........
<strong>somewhere to go </strong><br>
1 points| Saved Jan 08, 2014 at 00:49 <span class=notes_box>textABCXYZ123</span>
...
... more text..........<strong>other link </strong><br>
1 points| Saved Jan 08, 2014 at 00:49 <span class=notes_box>ANOTHERLINK</span>.
... more text........... more text........
There are many links and I need to capture the link which appears just before textABCXYZ123 string. i tried below regex but it is returning me first link instead of last one:
$find_string = 'ABCXYZ123';
preg_match('#href="(.*)".*text'.$find_string.'#sU',$html,$match);
// so final resutl is "http://www.site.com/link/123" which is first link
Can someone guide me how can I capture that link just before my string textABCXYZ123? P.S I know about xpath and simple html dom but I would like to match with regexp. Thanks for any input.
You could maybe try the regex:
href="([^"]*)">(?=(?:(?!href).)*textABCXYZ123)
Like so?
$find_string = 'ABCXYZ123';
preg_match('~href="([^"]*)">(?=(?:(?!href).)*text'.$find_string.')~sU',$html,$match);
regex101 demo
The first part is href="([^"]*)"> and shouldn't be too hard to understand. It matches href=" and then any number of non-quote characters, followed by quotes and >.
(?=(?:(?!href).)*textABCXYZ123) first is a positive lookahead. (A positive lookahead has the format (?= ... )) It will make sure that there is what's inside to say that there is a match.
For instance, a(?=.*b) matches any a, as long as there is any characters, then a b somewhere after the a (also means it matches a as long as there's a b somewhere after it).
So, href="([^"]*)"> will match only if there is (?:(?!href).)*textABCXYZ123 somewhere ahead.
(?:(?!href).)* is a modified .*, because the negative lookahead (format (?! ... )) makes sure no href is matched. You could say it's the opposite of a positive lookahead:
a(?!.*b) matches any a as long as it is not followed by a b.
(?s)href=[^<]+</a>(?!.*(href).*(textABCXYZ123))(?=.*(textABCXYZ123))
Could also try this, let me know if you want an explantation
Related
I have problem now, I was struggling to solve this regex issue, I already spent 2 hours for this.
Text: what2 when2 not3 not 2 not 2018 not2not
Expected: what-what when-when not3 not 2 not 2018 not2not
I want to replace every word that contains [alphabet]+ number(2) in the end of word. And then I will replace [text]2 into [text]-[text]
Here is my regex. My final script:
$str = 'what2 when2 not3 not 2 not 2018 not2not';
echo preg_replace('/[a-z]+2/i', "$0-$0", $str);
//result: what2-what2 when2-when2 not3 not 2 not 2018 not2-not2not
//expected: what-what when-when not3 not 2 not 2018 not2not
My mistake is:
My regex still includes not2not which shouldn't be included
I can't replace number(2) from my matched return ($0). I try $1 and $2 but still can't solve the problem.
Did I miss anything? I'm very bad at regex actually but always want to try learn it.
thanks for any advice
Change your preg_replace function to:
echo preg_replace('/([a-z]+)2/i', "$1-$1", $str);
The $0 means to replace the entire match. If you want just the word without the trailing 2 put capturing parenthesis around it by doing /([a-z]+)2/i then use $1 to grab just that capture. Or in other words the word without the 2 at the end. This returns:
what-what when-when not3 not 2 not 2018 not-notnot
Next the final not-notnot is because your not looking for a space or end of the string. So it captures the not2 in not2not. To fix that you can check for a word boundary afterwards by changing it to: /([a-z]+)2\b/i. The \b checks for both white space and end of line to capture strings like 'what2 yes2' correctly.
I suggest this solution:
$str = 'what2 when2 not3 not 2 not 2018 not2not';
echo preg_replace('/([a-z]+)2\s/i', "$1-$1 ", $str);
// OUTPUT: what-what when-when not3 not 2 not 2018 not2not
//expected: what-what when-when not3 not 2 not 2018 not2not
The \s used to found full words, but not 2 in the middle of word.
If you won't use it, you'll have a wrong last replace (not2not). But in this way, you should add space in subject ("$1-$1 ")
I'm trying to grab the Time that will work for the following divs source-code:
<div class="smallfont">
(09-03-2015, 09:16 PM)
</div>
<div class="smallfont">
(Yesterday, 11:11 AM)
</div>
<div class="smallfont">
(Today, 12:10 PM)
</div>
There is a lot of surrounding code but as long as it begins with "smallfont"> followed by carriage-return line-feed and maybe spaces/tabs, to go along with begin (
I'm really close. I have it working if it's just the 1st scenario with the Date, but it doesn't work if it's Today or Yesterday:
preg_match_all('/smallfont">[\n\r\s\t]+\([0-9\-]+,(.*?)\)/s', $output, $matchesTime);
2nd thing: Is it also possible to write code that'll then loop through and replace the Yesterday and Today with the appropriate date?
Thank you so much!
Use most consistent parts 09:16 PM 11:11 AM 12:10 PM as an end mark.
To match those \d+:\d+\s+[AP]M and combine with sourroundings like this.
smallfont">\s*\(\s*((?:[^,]+,\s*)?\d+:\d+\s+[AP]M)
\s matches a whitespace and includes \t \r \n already. \d matches a digit.
For use with preg_match_all. First group contains the date string matches.
2nd thing: ...replace the Yesterday and Today with the appropriate date
echo date("d-m-Y, h:i A", strtotime("yesterday, 09:16 PM"));
Just write all the possibilities in combination with [0-9]+
preg_match_all('/<div\b[^>]*\bclass="smallfont">[\n\r\s\t]+\((?:Today|Tomorrow|Yesterday|[0-9\-]+),\s*(.*?)\)/s', $output, $matchesTime);
DEMO
I have two strings
First One:
Date: Sat, 13 Jun 2015 13:26:05 +0100
Subject: Changing the balance: +50,00 CZK
Dear client,
Second One:
Date: Sat, 14 Jun 2015 14:58:05 +0100
Subject: Changing the balance: +75,00 CZK
Dear client,
And I really don't know what pattern to use if I want to get the number of CZKs from these strings. I need integer 50 from first string and integer 75 from second string (just integer not decimal with ,00).
This can really be as simple or as complex as you need it to be. In it's simplest form, you could look for a pattern that reads:
number.comma.number.space.CZK
this can be written as:
[0-9]+,[0-9]+\sCZK
[0-9] is a range, between 0-9 (number). The plus character means that at least 1 number is required. If you wanted to make this EXACTLY 2 numbers you could change [0-9]+ for [0-9]{2}
, is a comma...
[0-9]+ is another number (at least 1)
\s is a space
CZK is the string you're wanting to end with
You can expand upon this as you wish. Here is a working example: http://regexr.com/3baog
Edit:
If you wish to capture the 50 / 75, you need to wrap parenthesis around the part you're after, eg:
([0-9]+),[0-9]+\sCZK
Use positive lookahead to select the integer part of number before CZK
\d+(?=,\d*\s*CZK)
Explanation for the above regex can be seen from this DEMO
If you want to select the sign + or -, then you can add [+-]? at the beginning.
$string="18 Mar 2013 <b>...</b> And this is exactly what is sparking the resurgence of long-tail <br> <b>keyword</b> targeting in <b>SEO</b>. I've observed this trend among both young <b>...</b>"
$string="May be <b>google</b> not considering Meta <b>keywords</b> for his searching, but meta <br> descriptions play a vital role in your <b>SEO</b> practices,even including <b>...</b>"
$string="7 Jun 2010 <b>...</b> Picking <b>SEO Keywords</b>: Using <b>Google's</b> Wonder Wheel. This is in my opinion the <br> best little secret of everyone's favorite search engine: the <b>...</b>"
$string="For search engine marketers -- and the companies who depend on them -- things <br> just got a little tougher. <b>SEO</b> companies, most still reeling <b>...</b>"
$clean_string = preg_replace("need this here remove date regex syntax", "", $string);
need a reg ex or a small code. See the sample code above.. remove date at the beginning of the text.. not the same as changing the text and dates. I do not date as in the example at the beginning of some text. Thank you in advance to help friends
This should suffice for the date format shown in your examples:
$result = preg_replace(
'/^ # Start of string
\d{1,2} # 1-2 digits
\s+ # whitespace
[a-z]+ # 1 or more ASCII letters
\s+ # whitespace
\d{2,4} # 2-4 digits
\s* # optional whitespace/ix',
'', $subject);
how can i replace date/time in this format 'Fri Mar 23 15:21:08 2012' with preg_replace?
Date in this format is present couple of times in my text and i need to replace it with current time/date.
Thanks,
Chris
Well, what you need is an expression that will match 3 letters (Fri) followed by a space and another three letters (Mar).
First we need to match some letters:
/[a-z]/
We can match exactly 3 letters like this:
/[a-z]{3}/
...and we'll need it to be case insensitive:
/[a-z]{3}/i
...so the first part is just:
/[a-z]{3} [a-z]{3}/i
Next, we need to match either 1 or 2 numerics. A numeric can be represented with the escape sequence \d, so we'd use:
/\d{1,2}/
Next we match the time string, using the same escape sequence:
/\d{2}:\d{2}:\d{2}/
...followed by a final 4 digit year:
/\d{4}/
Put it all together and we get:
/[a-z]{3} [a-z]{3} \d{1,2} \d{2}:\d{2}:\d{2} \d{4}/i
// Fri Mar 23 15 : 21 : 08 2012
Now, we need to replace it with the current date and time. The usual place we'd go for that is the date() function, but how to we get that into the replacement dynamically? Well we could pass it as a string literal, or we could use a callback function to get it from preg_replace_callback(). But, preg_replace() gives us the e modifier which causes the replacement string to be evaluated for PHP code. We have to be careful and sparing with it's use, as with any PHP eval(), but this is a legitimate use case.
So our final PHP code looks like this:
preg_replace(
'/[a-z]{3} [a-z]{3} \d{1,2} \d{2}:\d{2}:\d{2} \d{4}/ie',
"date('D M j H:i:s Y')",
$str
);
See it working
I think listing the finite sets of options is kind of better for these task and it will also save you from false positives. These are the patterns to match each part of the date format:
Days: (?:Mon|Tue|Wed|Thu|Fri|Sat|Sun)
Months: (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
Day: \d{1,2}
Time: \d{1,2}:\d{2}:\d{2}
Year: \d{4}
Putting everything together:
(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2} \d{1,2}:\d{2}:\d{2} \d{4}
The code might look like:
$current_date = date('D M j H:i:s Y');
$text = preg_replace(
'/(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2} \d{1,2}:\d{2}:\d{2} \d{4}/i',
$current_date,
$text
);
See a working example.
preg_replace('/Fri Mar 23 15:21:08 2012/',date('D M d H:i:s Y'),$string);
Normally do what you want.