Get word from string - PHP - php

I am trying to extract a word that matches a specific pattern from various strings.
The strings vary in length and content.
For example:
I want to extract any word that begins with jac from the following strings and populate an array with the full words:
I bought a jacket yesterday.
Jack is going home.
I want to go to Jacksonville.
The resulting array should be [jacket,Jack,Jacksonville]
I have been trying to use preg_match() but for some reason it won't work. Any suggestions???
$q = "jac";
$str = "jacket";
preg_match($q,$str,$matches);
print $matches[1];
This returns null :S. I dunno what the problem is.

You can use preg_match as:
preg_match("/\b(jac.+?)\b/i", $string, $matches);
See it

You've got to read the manual a few hundred times and it will eventually come to you.
Otherwise, what you're trying to capture can be expressed as "look for 'jac' followed by 0 or more letters* and make sure it's not preceded by a letter" which gives you: /(?<!\\w)(jac\\w*)/i
Here's an example with preg_match_all() so that you can capture all the occurences of the pattern, not just the first:
$q = "/(?<!\\w)(jac\\w*)/i";
$str = "I bought a jacket yesterday.
Jack is going home.
I want to go to Jacksonville.";
preg_match_all($q,$str,$matches);
print_r($matches[1]);
Note: by "letter" I mean any "word character." Officially, it includes numbers and other "word characters." Depending on the exact circumstances, one may prefer \w (word character) or \b (word boundary.)
You can include extra characters by using a character class. For instance, in order to match any word character as well as single quotes, you can use [\w'] and your regexp becomes:
$q = "/(?<!\\w)(jac[\\w']*)/i";
Alternatively, you can add an optional 's to your existing pattern, so that you capture "jac" followed by any number of word characters optionally followed by "'s"
$q = "/(?<!\\w)(jac\\w*(?:'s)?)/i";
Here, the ?: inside the parentheses means that you don't actually need to capture their content (because they're already inside a pair of parentheses, it's unnecessary), and the ? after the parentheses means that the match is optional.

Related

Two or more occurrence of at least one in character set with PHP regex

I want to make PHP regex to find if text has two or more of at least one character in character set {-, l, s, i, a}.
I made like this.
preg_match("/[-lisa]{2,}/", $text);
But this doesn't work.
Please help me.
Matching two or more occurrences means matching two is enough for the check to be valid.
At least one in character set might either mean you want to match the same char from the set or any of the chars in the set two times. If you want the former, when the same char repeats, you can use preg_match('~([-lisa]).*?\1~', $string, $match) (note the single quotes delimiting the string literal, if you use double quotes, the backreference must have double backslash), if the latter, i.e. you want to match ..l...i.., you can use preg_match('~[-lisa].*?[-lisa]~', $string, $match) or preg_match('~([-lisa]).*?(?1)~', $string, $match) (where (?1) is a regex subroutine that repeats the corresponding group pattern).
If your strings contain line breaks, do not forget to add s modifier, preg_match('~([-lisa]).*?\1~s', $string, $match).
More than that, if you want to check for consecutive character repetition, you should remove .* from the above patterns, i.e. 1) must be preg_match('~([-lisa])\1~', $string, $match) and 2) must be preg_match('~[-lisa]{2}~', $string, $match) (though, this is not what you want judging by your own feeback, so this example here is just for the record).
The ([-lisa])\1{2} pattern that you find useful matches a repeated -, l, i, s or a char three times (---, lll, sss, etc.), thus only use it if it fits your requirements.
Note that preg_match functions searches for a match anywhere inside a string and does not require a full string match (thus, no need adding .* (or ^.*, .*$) at the start and end of the pattern).
See a sample regex demo, feel free to test your strings in this environment.

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.
A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.
You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.
In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

PHP Regular Expression - Extract Data

I have a long string, and am trying to extract specific data that is deliminated in that string by specific words.
For example, here is a subset of the string:
Current Owner 123 Capital Calculated
I am looking to extract
123 Capital
and as you can see it is surrounded by "Current Owner" (with a bunch of arbitrary spaces) to the left and "Calculated" (again with arbitrary spaces) to the right.
I tried this, but I'm a bit new at RegEx. Can anyone help me create a more effective RegEx?
preg_match("/Owner[.+]Calculated/",$inputString,$owner);
Thanks!
A character class defines a set of characters. Saying, "match one character specified by the class". Place the dot . and quantifier inside of a capturing group instead and enable the s modifier which forces the dot to span newlines.
preg_match('/Owner(.+?)Calculated/s', $inputString, $owner);
echo trim($owner[1]);
Note: + is a greedy operator, meaning it will match as much as it can and still allow the remainder of the regex to match. Use +? instead to prevent greediness meaning "one or more — preferably as few as possible".
You can use lookarounds as
(?<=Owner)\s*.*?(?=\s+Calculated)
Example usage
$str = "Current Owner 123 Capital Calculated ";
preg_match("/(?<=Owner)\s*.*?(?=\s+Calculated)/", $str, $matches);
print_r($matches);
Will give an output
Array ( [0] => 123 Capital )
Hope this helps, group index #1 is your target:
Owner\s+(\d+\s+\w+)\s+Calculated
You may also want to try a tool like RegExr to help you learn/tinker.

Finding match, removing the bits I don't want, and then putting it back in

I'm trying to parse thru a file and find a particular match, filter it in some way, and then print that data back into the file with some of the characters removed. I've been trying different things for a couple hours with preg slits and preg replace, but my regular express knowledge is limited so I haven't made much progress.
I have a large file that has many instances like this [something]{title:value}. I want to find everything between "[" and "}" and remove everything besides the "something" bit.
After that parts done I want to find everything between "{" and "}" on everything left like {title:value} and then remove everything besides the "value" part. I'm sure there is some simple method to do this, so even just a resource on how to get started would be helpful.
Not sure if I get your meaning right (and haven't touched PHP for months), what about this?
$matches = array();
preg_match_all("/\[(.*?)\]\{.*?:(.*?)\}/", $str, $matches);
$something = $matches[1]; // $something stores all texts in the "something" part
$value = $matches[2]; // $value stores all texts in the "value" part
Doc for preg_match_all
For the regex pattern \[(.*?)\]\{.*?:(.*?)\}:
We escapes all the [, ], { and } with a slash because these characters have a special meaning in regex, and need an escape for the literal character.
.*? is a lazy match all, which will match any character until the next character matches the next token. It is used instead of .* so that it won't match other symbols
(.*?) is a capturing group, getting what we need and PHP will put those matches in $matches array
So the entire thing is - match the [ character, then any string until getting the ] character and put it in capturing group 1, then ]{ characters, then any string until getting the : character (no capturing group because we don't care.), then match the : character, then any string until the } character and put it incapturing group 2.
You can do it in one shot:
$txt = preg_replace('~\[\K[^]]*(?=])|{[^:}]+:\K[^}]+(?=})~', '', $txt);
\K removes from match result all that have been matched on his left.
The lookahead (?=...) (followed by) performs a check but add nothing to the match result.

Regexp grab all text following last match

I am trying to grab the text after the last number in the string and grab the whole string if it doesn't contain numbers.
The best regex I could come up with is:
([^\d\s]*)$
However I found that \s and \d aren't supported in mysql regexp rather [[:space:]] and not sure what \d is equivalent too.
This is what I'm trying to accomplish:
'1/2 Oz' returns 'Oz'
'2 3/4 Oz' returns 'Oz'
'As needed' returns 'As needed'
This is the regex you will need:
/^.*?(\d+(?=\D*$)\s*)/
And just replace matched text with empty string ""
PHP code:
$s = preg_replace('/^.*?(\d+(?=\D*$)\s*)/', '', 'Foo Oz');
//=> Foo Oz
$s = preg_replace('/^.*?(\d+(?=\D*$)\s*)/', '', '1/2 Oz');
//=> Oz
Live Demo: http://ideone.com/u887D7
First of all, you could simply avoid the class, and use a range instead:
[^0-9[:space:]]*$
But there is one for digits as well (which may actually include non-ASCII digits). The documentation has a list of these. They are called POSIX bracket expressions by the way.
[^[:digit:][:space:]]*$
However, the general problem with this approach is that it doesn't allow for spaces later on in the string (like the one between As and needed. To get those, but still avoid capturing trailing spaces after digits, make sure, the first character is neither space nor digit, then match the rest of the string as non-digits. In addition, make the whole thing optional, to ensure that it still works with strings ending in a digit.
([^[:digit:][:space:]][^:digit:]*)?$

Categories