preg_match from some html code

preg_match from some html code - php

How would I write a php preg_match() in php to pick out the 250 value. I have a large string of html code that I want to pick the 250 out of and I can't seem to get the regular expression right.
This is the html pattern I want to match - note that I want to extract the integer where the 250 is:
<span class="price-ld">H$250</span>
I have been trying for hours to do this and I can't get it to work lol

preg_match('/<span class="price-ld">H$(\d+)<\/span>/i', $your_html, $matches);
print "Its ".$matches[1]." USD";
The regex actually depends on your code. Where are you exactly searching for?

This is the regex you're looking for:
(?<=<span class="price-ld">H\$)\d+(?=</span>)
You can see the results here.
And here's the explanation:
Options: case insensitive; ^ and $ match at line breaks
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=<span class="price-ld">H\$)»
Match the characters “<span class="price-ld">H” literally «<span class="price-ld">H»
Match the character “$” literally «\$»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=</span>)»
Match the characters “</span>” literally «span>»

Related

regex match sub-string before sequence of numbers

I only want to get the - and space after - before the 4 numbers. I made the following regex to try and match these characters. ^(- )+?(?=\d{4})$
if i try this regex on the number string below i get no matches.
- 7575
what am i doing wrong?
I quite am new to regex.
Thanks in advance.

What your actual regex does is :
^(- )+? => match a sequence of -
Which has to be followed by 4 digit (?=\d{4}) without matching it
Then ending sentence $
So it's impossible.
You either , if you dont want to match the digit, have to put the end in the positive lookahead like
^(- )+?(?=\d{4}$)
Or remove the positive lookahead like
^(- )+?\d{4}$

Regex to get the first number after a certain string followed by any data until the number

I have a piece of data, retrieved from the database and containing information I need. Text is entered in a free form so it's written in many different ways. The only thing I know for sure is that I'm looking for the first number after a given string, but after that certain string (before the number) can be any text as well.
I tried this (where mytoken is the string I know for sure its there) but this doesn't work.
/(mytoken|MYTOKEN)(.*)\d{1}/
/(mytoken|MYTOKEN)[a-zA-Z]+\d{1}/
/(mytoken|MYTOKEN)(.*)[0-9]/
/(mytoken|MYTOKEN)[a-zA-Z]+[0-9]/
Even mytoken can be written in capitals, lowercase or a mix of capitals and lowercase character. Can the expression be case insensitive?

You do not need any lazy matching since you want to match any number of non-digit symbols up to the first digit. It is better done with a \D*:
/(mytoken)(\D*)(\d+)/i
See the regex demo
The pattern details:
(mytoken) - Group 1 matching mytoken (case insensitively, as there is a /i modifier)
(\D*) - Group 2 matching zero or more characters other than a digit
(\d+) - Group 3 matching 1 or more digits.
Note that \D also matches newlines, . needs a DOTALL modifier to match across newlines.

You need to use a lazy quantifier. You can do that by putting a question mark after the star quantifier in the regex: .*?. Otherwise, the numbers will be matched by the dot operator until the last number, which will be matched by \d.
Regex: /(mytoken|MYTOKEN)(.*?)\d/
Regex demo

You can use the opposite:
/(mytoken|MYTOKEN)(\D+)(\d)/
This says: mytoken, followed by anything not a number, followed by a number. The (lazy) dot-star-soup is not always your best bet. The desired number will be in $3 in this example.

Regex to return different length sentences

I'm tyring to match different length sentences with digits at the begining.
How can I do this and return matches of different lengths?
eg "2341' Macbeth",
"2354' The Hunger Games",
"1236' Crimson Peak"
preg_match_all("d+\\'\s\w+\s\w+(?(?=w))~", $string, $array);
Clearly I'm new to regex and programming in general, any responses would be greatly appreciated.
Thank you.

You can just use \d+.
working demo
If you want to capture then use capturing groups
(\d+)
Match information
MATCH 1
1. [0-4] `2341`
MATCH 2
1. [17-21] `2354`
MATCH 3
1. [43-47] `1236`
Btw, if you have a multiline sentence, just add the ^ at the beginning to match only line starting with numbers:
^(\d+)
Working demo

I guess this will work
/^\d+.*?$/
DEMO
https://regex101.com/r/fY9yA5/1
REGEX EXPLANATION
^\d+.*?$
Assert position at the beginning of a line «^»
Match a single character that is a “digit” «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match any single character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Assert position at the end of a line «$»

Remove number from large string in specific position [PHP RegEx]

I have a large string (multiple lines) I need to find numbers in with regex. The position the number I need is always proceeded/follow by an exact order of characters so I can use non-capturing matches to pinpoint the exact number I need. I put together a regex to get this number but it refuses to work and I can't figure it out!
Below is a small bit of php code that I can't get to work showing the basic format of what i need
$sTestData = 'lak sjdhfklsjaf<?kjnsdfh461uihrfkjsn+%5Bmlknsadlfjncas dlk';
$sNumberStripRE = '/.*?(?:sjdhfklsjaf<\\?kjnsdfh)(\\d+)(?:uihrfkjsn\\+%5Bmlknsadlfjncas).*?/gim';
if (preg_match_all($sNumberStripRE, $sTestData, $aMatches))
{
var_dump($aMatches);
}
the number I need is 461 and the characters before/after the spaces on either side of this number are always the same
any help getting the above regex working would be great!
This link RegExr: My Reg Ex (to an online regex genereator and my regex) shows that it should work!

g is an invalid modifier, drop it.
Ideone Link

With regard to that link, which regular expression engine is it working from? Built in Flex, so probably the ActionScript RegExp engine. They are not all the same, each one varies.
You have a number of double-backslashes, they should probably be single in those strings.

$sTestData = 'lak sjdhfklsjaf<?kjnsdfh461uihrfkjsn+%5Bmlknsadlfjncas dlk';
$lDelim = ' sjdhfklsjaf<?kjnsdfh';
$rDelim = 'uihrfkjsn+%5Bmlknsadlfjncas ';
$start = strpos($sTestData, $lDelim) + strlen($lDelim);
$length = strpos($sTestData, $rDelim) - $start;
$number = substr($sTestData, $start, $length);

Using regex you can accomplish your goal with the following code:
$string='lak sjdhfklsjaf<?kjnsdfh461uihrfkjsn+%5Bmlknsadlfjncas dlk';
if (preg_match('/(sjdhfklsjaf<\?kjnsdfh)(\d+)(uihrfkjsn\+%5Bmlknsadlfjncas)/', $string, $num_array)) {
$aMatches = $num_array[2];
} else {
$aMatches = "";
}
echo $aMatches;
Explanation:
I declared a variable entitled $string and made it equal to the variable you initially presented. You indicated that the characters on either side of the numeric value of interest were always the same. I assigned the numerical value of interest to $aMatches by setting $aMatches equal to back reference 2. Using the parentheses in regex you will get 3 matches: backreference 1 which will contain the characters before the number, backreference 2 which will contain the numbers that you want, and backreference 3 which is the stuff after the number. I assigned $num_array as the variable name for those backreferences and the [2] indicates that it is the second backreference. So, $num_array[1] would contain the match in backreference 1 and $num_array[3] would contain the match in backreference 3.
Here is the explanation of my regular expression:
Match the regular expression below and capture its match into backreference number 1 «(sjdhfklsjaf<\?kjnsdfh)»
Match the characters “sjdhfklsjaf<” literally «sjdhfklsjaf<»
Match the character “?” literally «\?»
Match the characters “kjnsdfh” literally «kjnsdfh»
Match the regular expression below and capture its match into backreference number 2 «(\d+)»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below and capture its match into backreference number 3 «(uihrfkjsn+%5Bmlknsadlfjncas)»
Match the characters “uihrfkjsn” literally «uihrfkjsn»
Match the character “+” literally «+»
Match the characters “%5Bmlknsadlfjncas” literally «%5Bmlknsadlfjncas»
Hope this helps and best of luck to you.
Steve

What does this regex mean in PHP?

/(?![a-z]+:)/
Anyone knows?

the / are delimiters.
?! is negative lookahead.
[a-z] is a character class (any character in the a-z range)
+ is one-or-more times of the preceding pattern ([a-z] in this case)
: is just the colon literal
It roughly means "look ahead and make sure there are no alpha characters followed by a colon".
This regex would make more sense if it had a start of string anchor: /^(?![a-z]+:/, so it wouldn't match abc: (like one of the other answers say), but without the (^) I don't know how useful this is.

according to Regex Buddy (a product i highly recommend):
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?![a-z]+:)»
Match a single character in the range between “a” and “z” «[a-z]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “:” literally «:»

(?!REGEX) is the syntax for negative lookahead. Check the link for an explanation of lookaheads.
The regex fails if the pattern [a-z]+: appear in the string from the current position. If the pattern is not found, regex would succeed, but won't consume any characters.
It would match 123: or abc but not abc:
It would match the : in abc:.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_match from some html code - php

preg_match('/<span class="price-ld">H$(\d+)<\/span>/i', $your_html, $matches); print "Its ".$matches[1]." USD"; The regex actually depends on your code. Where are you exactly searching for?

Related

regex match sub-string before sequence of numbers

Regex to get the first number after a certain string followed by any data until the number

Regex to return different length sentences

Remove number from large string in specific position [PHP RegEx]

What does this regex mean in PHP?

Categories

Resources