extract text between two words in php

extract text between two words in php - php

I got the following URL
http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego
and I want to extract
B000NO9GT4
that is the asin...to now, I can get search between the string, but not in this way I require. I saw the split functin, I saw the explode. but cant find a way out...also, the urls will be different in length so I cant hardcode the length two..the only thing which make some sense in my mind is to split the string so that
http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/
become first part
and
B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego
becomes the 2nd part , from the second part , I should extract B000NO9GT4
in the same way, i would want to get product name LEGO-Ultimate-Building-Set-Pieces from the first part
I am very bad at regex and cant find a way out..
can somebody guide me how I can do it in php?
thanks

This grabs both pieces of information that you are looking to capture:
$url = 'http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego';
$path = parse_url($url, PHP_URL_PATH);
if (preg_match('#^/([^/]+)/dp/([^/]+)/#i', $path, $matches)) {
echo "Description = {$matches[1]}<br />"
."ASIN = {$matches[2]}<br />";
}
Output:
Description = LEGO-Ultimate-Building-Set-Pieces
ASIN = B000NO9GT4
Short Explanation:
Any expressions enclosed in ( ) will be saved as a capture group. This is how we get at the data in $matches[1] and $matches[2].
The expression ([^/]+) says to match all characters EXCEPT / so in effect it captures everything in the URL between the two / separators. I use this pattern twice. The [ ] actually defines the character class which was /, the ^ in this case negates it so instead of matching / it matches everything BUT /. Another example is [a-f0-9] which would say to match the characters a,b,c,d,e,f and the numbers 0,1,2,3,4,5,6,7,8,9. [^a-f0-9] would be the opposite.
# is used as the delimiter for the expression
^ following the delimiter means match from the beginning of the string.
See www.regular-expressions.info and PCRE Pattern Syntax for more info on how regexps work.

You can try
$str = "http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego" ;
list(,$desc,,$num,) = explode("/",parse_url($str,PHP_URL_PATH));
var_dump($desc,$num);
Output
string 'LEGO-Ultimate-Building-Set-Pieces' (length=33)
string 'B000NO9GT4' (length=10)

Related

preg_replace - similar patterns

I have a string that contains something like "LAB_FF, LAB_FF12" and I'm trying to use preg_replace to look for both patterns and replace them with different strings using a pattern match of;
/LAB_[0-9A-F]{2}|LAB_[0-9A-F]{4}/
So input would be
LAB_FF, LAB_FF12
and the output would need to be
DAB_FF, HAD_FF12
Problem is, for the second string, it interprets it as "LAB_FF" instead of "LAB_FF12" and so the output is
DAB_FF, DAB_FF
I've tried splitting the input line out using 2 different preg_match statements, the first looking for the {2} pattern and the second looking for the {4} pattern. This sort of works in that I can get the correct output into 2 separate strings but then can't combine the two strings to give the single amended output.

\b is word boundary. Meaning it will look at where the word ends and not only pattern match.
https://regex101.com/r/upY0gn/1
$pattern = "/\bLAB_[0-9A-F]{2}\b|\bLAB_[0-9A-F]{4}\b/";
Seeing the comment on the other answer about how to replace the string.
This is one way.
The pattern will create empty entries in the output array for each pattern that fails.
In this case one (the first).
Then it's just a matter of substr.
$re = '/(\bLAB_[0-9A-F]{2}\b)|(\bLAB_[0-9A-F]{4}\b)/';
$str = 'LAB_FF12';
preg_match($re, $str, $matches);
var_dump($matches);
$substitutes = ["", "DAB", "HAD"];
For($i=1; $i<count($matches); $i++){
If($matches[$i] != ""){
$result = $substitutes[$i] . substr($matches[$i],3);
Break;
}
}
Echo $result;
https://3v4l.org/gRvHv

You can specify exact amounts in one set of curly braces, e.g. `{2,4}.
Just tested this and seems to work:
/LAB_[0-9A-F]{2,4}/
LAB_FF, LAB_FFF, LAB_FFFF
EDIT: My mistake, that actually matches between 2 and 4. If you change the order of your selections it matches the first it comes to, e.g.
/LAB_([0-9A-F]{4}|[0-9A-F]{2})/
LAB_FF, LAB_FFFF
EDIT2: The following will match LAB_even_amount_of_characters:
/LAB_([0-9A-F]{2})+/
LAB_FF, LAB_FFFF, LAB_FFFFFF...

Regex After Last / and Before period

Sorry if the title is confusing. All I'm trying to do is some simple regex:
The text: /thing/images/info.gif
And what I want is: info
My regex (not fully working): ([^\/]+$)(.*?)(?=\.gif)
(Note: [^\/]+$ returns info.gif)
Thanks for any help!

I'd say you don't need to match all the string, so you can be much more generic. If you know your string always contains a path you can just use:
preg_match( '/([^\/]+)\.\w+$/', "/thing/images/info.gif", $matches) ;
print_r( $matches );
and it will be valid for any filename, even names that contains dots like my_file.name.jpg or spaces like /thing/images/my image.gif
Demo here.
The structure is (from the end of the regex moving to the left):
Match before the end of the string
any number of characters preceded by a dot
any character that is not a slash (your filename, if there is a slash, there starts the directories)

Not sure how much more complex the string is but this seems to work on the test string:
preg_match('![^/.]+(?=\.gif)!', '/thing/images/info.gif', $m);
Matching NOT / NOT . followed by .gif.

In editors (Sublime):
Find:^(.*)(\/)(.*)(\.)(.*)$
Replace it with:\3
In PHP:
<?php
preg_match('/^(.*)(\/)(.*)(\.)(.*)$/', '/thing/images/info.gif', $match);
echo $match[3];

regex to clean up url

I am looking for a way to get a valid url out of a string like:
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
My original solution was:
preg_match('#^[^:|]*#', str_replace('//', '/', $string), $modifiedPath);
But obviously its going to remove a slash from the http:// instead of the one in the middle of the string.
My expected output that I want from the original is:
http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
I could always break off the http part of the string first but would like a more elegant solution in the form of regex if possible. Thanks.

This will do exactly what you are asking:
<?php
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
preg_match('/^([^|]+)/', $string, $m); // get everything up to and NOT including the first pipe (|)
$string = $m[1];
$string = preg_replace('/(?<!:)\/\//', '/' ,$string); // replace all occurrences of // as long as they are not preceded by :
echo $string; // outputs: http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
exit;
?>
EDIT:
(?<!X) in regular expressions is the syntax for what is called a lookbehind. The X is replaced with the character(s) we are testing for.
The following expression would match every instance of double slashes (/):
\/\/
But we need to make sure that the match we are looking for is NOT preceded by the : character so we need to 'lookbehind' our match to see if the : character is there. If it is then we don't want it to be counted as a match:
(?<!:)\/\/
The ! is what says NOT to match in our lookbehind. If we changed it to (?=:)\/\/ then it would only match the double slashes that did have the : preceding them.
Here is a Quick tutorial that can explain it all better than I can lookahead and lookbehind tutorial

Assuming all your strings are in the form given, you don't need any but the simplest of regexes to do this; if you want an elegant solution, then a regex is definitely not what you need. Also, double slashes are legal in a URL, just like in a Unix path, and mean the same thing a single slash does, so you don't really need to get rid of them at all.
Why not just
$url = array_shift(preg_split('/\|/', $string));
?
If you really, really care about getting rid of the double slashes in the URL, then you can follow this with
$url = preg_replace('/([^:])\/\//', '$1/', $url);
or even combine them into
$url = preg_replace('/([^:])\/\//', '$1/', array_shift(preg_split('/\|/', $string)));
although that last form gets a little bit hairy.

Since this is a quite strictly defined situation, I'd consider just one preg to be the most elegant solution.
From the top of my head:
$sanitizedURL = preg_replace('~((?<!:)/(?=/)|\\|.+)~', '', $rawURL);
Basically, what this does is look for any forward slash that IS NOT preceded by a colon (:), and IS followed bij another forward slash. It also searches for any pipe character and any character following it.
Anything found is removed from the result.
I can explain the RegEx in more detail if you like.

regex to find all text after delimited string

I have some content that contains a token string in the form
$string_text = '[widget_abc]This is some text. This is some text, etc...';
And I want to pull all the text after the first ']' character
So the returned value I'm looking for in this example is:
This is some text. This is some text, etc...

preg_match("/^.+?\](.+)$/is" , $string_text, $match);
echo trim($match[1]);
Edit
As per author's request - added explanation:
preg_match(param1, param2, param3) is a function that allows you to match a single case scenario of a regular expression that you're looking for
param1 = "/^.+?](.+?)$/is"
"//" is what you put on the outside of your regular expression in param1
the i at the end represents case insensitive (it doesn't care if your letters are 'a' or 'A')
s - allows your script to go over multiple lines
^ - start the check from the beginning of the string
$ - go all the way to end of the string
. - represents any character
.+ - at least one or more characters of anything
.+? - at least one more more characters of anything until you reach
.+?] - at least one or more characters of anything until you reach ] (there is a backslash before ] because it represents something in regular expressions - look it up)
(.+)$ - capture everything after ] and store it as a seperate element in the array defined in param3
param2 = the string that you created.
I tried to simplify the explanations, I might be off, but I think I'm right for the most part.

The regex (?<=]).* will solve this problem if you can guarantee that there are no other square brackets on the line. In PHP the code will be:
if (preg_match('/(?<=\]).*/', $input, $group)) {
$match = $group[0];
}
This will transform [widget_abc]This is some text. This is some text, etc... into This is some text. This is some text, etc.... It matches everything that follows the ].

$output = preg_replace('/^[^\]]*\]/', '', $string_text);

Is there any particular reason why a regex is wanted here?
echo substr(strstr($string_text, ']'), 1);

A regex is definitely overkill for this instance.
Here is a nice one-liner :
list(, $result) = explode(']', $inputText, 2);
It does the job and is way less expensive than using regular expressions.

Regular Expression to collect everything after the last /

I'm new at regular expressions and wonder how to phrase one that collects everything after the last /.
I'm extracting an ID used by Google's GData.
my example string is
http://spreadsheets.google.com/feeds/spreadsheets/p1f3JYcCu_cb0i0JYuCu123
Where the ID is: p1f3JYcCu_cb0i0JYuCu123
Oh and I'm using PHP.

This matches at least one of (anything not a slash) followed by end of the string:
[^/]+$
Notes:
No parens because it doesn't need any groups - result goes into group 0 (the match itself).
Uses + (instead of *) so that if the last character is a slash it fails to match (rather than matching empty string).
But, most likely a faster and simpler solution is to use your language's built-in string list processing functionality - i.e. ListLast( Text , '/' ) or equivalent function.
For PHP, the closest function is strrchr which works like this:
strrchr( Text , '/' )
This includes the slash in the results - as per Teddy's comment below, you can remove the slash with substr:
substr( strrchr( Text, '/' ), 1 );

Generally:
/([^/]*)$
The data you want would then be the match of the first group.
Edit   Since you’re using PHP, you could also use strrchr that’s returning everything from the last occurence of a character in a string up to the end. Or you could use a combination of strrpos and substr, first find the position of the last occurence and then get the substring from that position up to the end. Or explode and array_pop, split the string at the / and get just the last part.

You can also get the "filename", or the last part, with the basename function.
<?php
$url = 'http://spreadsheets.google.com/feeds/spreadsheets/p1f3JYcCu_cb0i0JYuCu123';
echo basename($url); // "p1f3JYcCu_cb0i0JYuCu123"
On my box I could just pass the full URL. It's possible you might need to strip off http:/ from the front.
Basename and dirname are great for moving through anything that looks like a unix filepath.

/^.*\/(.*)$/
^ = start of the row
.*\/ = greedy match to last occurance to / from start of the row
(.*) = group of everything that comes after the last occurance of /

you can also normal string split
$str = "http://spreadsheets.google.com/feeds/spreadsheets/p1f3JYcCu_cb0i0JYuCu123";
$s = explode("/",$str);
print end($s);

This pattern will not capture the last slash in $0, and it won't match anything if there's no characters after the last slash.
/(?<=\/)([^\/]+)$/
Edit: but it requires lookbehind, not supported by ECMAScript (Javascript, Actionscript), Ruby or a few other flavors. If you are using one of those flavors, you can use:
/\/([^\/]+)$/
But it will capture the last slash in $0.

Not a PHP programmer, but strrpos seems a more promising place to start. Find the rightmost '/', and everything past that is what you are looking for. No regex used.
Find position of last occurrence of a char in a string

based on #Mark Rushakoff's answer the best solution for different cases:
<?php
$path = "http://spreadsheets.google.com/feeds/spreadsheets/p1f3JYcCu_cb0i0JYuCu123?var1&var2#hash";
$vars =strrchr($path, "?"); // ?asd=qwe&stuff#hash
var_dump(preg_replace('/'. preg_quote($vars, '/') . '$/', '', basename($path))); // test.png
?>
Regular Expression to collect everything after the last /
How to get file name from full path with PHP?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

extract text between two words in php - php

Related

preg_replace - similar patterns

Regex After Last / and Before period

regex to clean up url

regex to find all text after delimited string

Regular Expression to collect everything after the last /

Categories

Resources