I'm a regex newbie, please help me out. The string below occurs in one document:
not_unique\">20,000 miles under sea
I need to extract the number. The sequence "not_unique" is not unique and may occur in the whole document several times before this sample comes. The part "miles under sea" is unique for the document, can be used as ending delimiter.
I tried something like this in PHP, but it didn't work for me:
if (preg_match('/(?=.*?miles under sea)(?!.+?not_unique)not_unique/', $document, $regs)) {...}
Please help!
How about something like this?
<?php
$document = "blah blah blah sjhsdijf not_unique\">20,000 miles under sea</a> jkdjksds sdsjdlksdsd k skdjsld sd";
//the made optional, also account for 'leagues' instead of miles
preg_match("/([0-9,]{1,6})\s?(miles|leagues)\sunder(\sthe)?\ssea/i", $document, $matches);
print_r($matches);
?>
/ not unique\">\s*([0123456789,]+)\s*miles under the sea /
should do it.
This should do the trick:
preg_match_all('/[1234567890\,]+ miles under sea/i', 'not_unique\">20,000 miles under sea', $result); //find all occurances of the pattern
$tempval=$result[sizeof($result)-1]; //get the last one
$endresult=substr($tempval,0,strlen($tempval)-16); //get the string without the length of the ending string
If needed - replace 16 with the exact length of the ending string.
Related
enter image description herei have tried many php functions like strpos(), preg_match() but none of them works. i have a string
i want to extract only the four digit number which is 1234.
<?php
$texxt="abcd1245 784563 1234 98756 kfg7456178";
$results=array();
preg_match('/[0-9]{4}/', $texxt, $results);
print_r($results);
?>
but the above code return 1245 instead of 1234.if i remove the abcd1245 then the out put is 7845.the actual string is very large it containg more than 200 numbers like above. i want only the exact 4 digit number. is there any way to solve this?
You need to place boundaries on both sides of your pattern.
\b\d{4}\b
An alternative would be to use \s instead of \b for whitespace - because boundaries will match other non-alphanumeric characters. Depends on exactly what you're looking for.
See it here
As you said you have more than 200 numbers then use below code:
<?php
$texxt="abcd1245 784563 1234 3421 98756 kfg7456178";
$results=array();
preg_match_all('/\b\d{4}\b/', $texxt, $results);
print_r($results);
?>
preg_match check for only one occurrence, where as preg_match_all check all occurrences.
For regex explanation please refer doc.
i have huge string that i need to separate information. Some parts of it vary and some dont. The difficulty i am facing is that i cant find a symbol or something on which i could get the match i want. So here is the string:
$str = "01;01;283;Póvoa do Vâle do Trigo;15315100 01;01;249;Alcafaz;;;;;;;;;;;3750;011;AGADÃO 01;01;2504;Caselho;;;;;;;;;;;3750;012;AGADÃO _ "15" '' ghdhghg AND IT CONTINUES
so if we look at the first part of the string (01;01;283;Póvoa do Vale do Trigo;15315100), what i want to stay with is:
01;01;283
and remove the rest of the stuff
in every case, but looking at the first example... :
the 01 is always a number never superior to 2 (not 040 or 150505 or 4075)
the same for the next 01 never superior to 2 (not 405 or 1565 or 425)
then the 283 is the number that can be bigger, it varies (it can be 300 or 17581 or 40755794)
essentially in the end i want only the beginning of each part like:
01;01;283
01;01;249
01;01;2504
05,80,104258
94,76,56789124
sorry for any misspelling i am Portuguese
i forget to say that this separated parts will then go to an array! so the regular expression should not match for example like this:
15315100 01;01;249
so i cant use .+ for example
I AM USING PREG_REPLACE
Try this:
/(\d+;\d+;\d+)/
Should work.
Try the following. The regex is in the match_all line.
$str = "***01;01;283***;Póvoa do Vâle do Trigo;15315100 ***01;01;249***;Alcafaz;;;;;;;;;;;3750;011;AGADÃO ";
preg_match_all("/\*\*\*[01][0-9];[01][0-9];[0-9]*\*\*\*.*?/", $str, $matches);
print_r($matches);
((?:\d\d;){2}\d+)
DEMO
And maybe it would be easier to just get everything between ***XXX***
\*([\d;]+)\*
DEMO
I am looking to develop a search function that allows users to just search for the item, or modify their search with a price range in brackets. So that is to say if they are looking for a car, then they can enter either car and receive all cars in the database or they can enter car (100, 299) or car(100, 299) and receive only cars in the database with the price range of 100 to 299.
Before what I did was three different explode function calls, but that was cumbersome and looked ridiculously ugly. I also tried to put the the brackets in an array and then compare that against the word searched (a word is basically an array of characters) but that didn't work. Finally I have been reading up on strpos and substr but they don't seem to fit the requirements as strpos returns the first occurrence of the the character and substr returns the characters within a specified length after a specific occurrence.
So for example the problem with strpos is the user can just enter ( and no ) bracket and I'll make a call to my search function with who knows what. And for example the problem with substr is that the price range can vary wildly.
You can use preg_match to parse the search string - I'm assuming that's the part you're having issues with.
if (preg_match('/car ?\(([^,]+), ?([^\)]+)\)/', $search_text, $matches)) {
$low_price = $matches[1];
$high_price = $matches[2];
//do your price filtering here
}
The regular expression may need a little tweaking, I don't remember offhand if parentheses need to be escaped in character classes.
Yes, Sam is right. You should do this with regular expressions.
Look for preg_match() on the documentation
To complete his answer, the regular expression for your case is:
$regex = "^([a-zA-Z]+)\s\(([0-9]+),([0-9]+)\)$"
if (preg_match($regex, $search_text, $matches)) {
$type = $matches[0];
$low_price = $matches[1];
$high_price = $matches[2];
//do your price filtering here
}
Be careful, as the array containing matches starts at index 0, not one.
Currently I am developing a web application to fetch Twitter stream and trying to create a natural language processing by my own.
Since my data is from Twitter (limited by 140 characters) there are many words shortened, or on this case, omitted space.
For example:
"Hi, my name is Bob. I m 19yo and 170cm tall"
Should be tokenized to:
- hi
- my
- name
- bob
- i
- 19
- yo
- 170
- cm
- tall
Notice that 19 and yo in 19yo have no space between them. I use it mostly for extracting numbers with their units.
Simply, what I need is a way to 'explode' each tokens that has number in it by chunk of numbers or letters without delimiter.
'123abc' will be ['123', 'abc']
'abc123' will be ['abc', '123']
'abc123xyz' will be ['abc', '123', 'xyz']
and so on.
What is the best way to achieve it in PHP?
I found something close to it, but it's C# and spesifically for day/month splitting. How do I split a string in C# based on letters and numbers
You can use preg_split
$string = "Hi, my name is Bob. I m 19yo and 170cm tall";
$parts = preg_split("/(,?\s+)|((?<=[a-z])(?=\d))|((?<=\d)(?=[a-z]))/i", $string);
var_dump ($parts);
When matching against the digit-letter boundary, the regular expression match must be zero-width. The characters themselves must not be included in the match. For this the zero-width lookarounds are useful.
http://codepad.org/i4Y6r6VS
how about this:
you extract numbers from string by using regexps, store them in an array, replace numbers in string with some kind of special character, which will 'hold' their position. and after parsing the string created only by your special chars and normal chars, you will feed your numbers from array to theirs reserved places.
just an idea, but imho might work for you.
EDIT:
try to run this short code, hopefully you will see my point in the output. (this code doesnt work on codepad, dont know why)
<?php
$str = "Hi, my name is Bob. I m 19yo and 170cm tall";
preg_match_all("#\d+#", $str, $matches);
$str = preg_replace("!\d+!", "#SPEC#", $str);
print_r($matches[0]);
print $str;
$str = 'title="room 5 stars"';
preg_match_all('/title="([0-9]+)"/sm', $str, $rate);
I need to grab number 5 from title. The regex doesn't work!
If i do this:
preg_match_all('/title="([0-9]+)"/sm', $str, $rate);
I get:
room 5 stars
However, this one doesn't return anything:
'/title="([0-9]+)"/sm'
Where did i go wrong?
You're not taking into account the words around the number, try this:
$str = 'title="room 5 stars"';
preg_match_all('/title=".*(\d+).*"/', $str, $rate);
// The number is then in $rate[1][0];
You forgot to match the text before and after your number.
Try with : /title=".*([0-9]+).*"/
PS: you don't need m and s option
* is a greedy match, it might give wrong results sometimes.
You can use /title=".*?(\d+).*?"/ which is a lazy match and will search the least characters.
You can also try this free tool for regex matching: RegExr