I would like to get the urls from a webpage that starts with "http://example.com/category/" from these tags below:
<td>Test</td>
Note:
257849 = random number
Any suggestion would be very much appreciated.
Thanks!
Just specify the fixed base URL asis in the regex, and use [\w/]+ to match any combination of letters, numbers and the / slash afterwards:
preg_match('#http://example.com/category/[\w/]+#', $text, $match);
print $match[0];
And to extract all urls at once, use preg_match_all() instead.
preg_match_all('#http://example.com/category[^"]+#', $text, $a);
The result will be in $a
Related
I am trying to extract the digits from between the words in this string.
110.0046102.005699.0008103.0104....
I want to extract 4 digits after dot (point/period).
110.0046
102.0056
99.0008
103.0104
I was wondering if this was possible to do with a regular expression or if I should just use other way.
// replace the variable $numbers with your numbers
$numbers = "110.0046102.005699.0008103.0104";
preg_match_all("#\d+\.\d{4}#", $numbers, $matches);
var_dump($matches); // outputting all matches
https://regex101.com/r/oG1dK1/1 -> you can see the regex in action here. The numbers are in the box MATCH INFORMATION on the right.
Try this regex:
(\d{1,}\.\d{4})
Demo here: https://regex101.com/r/uJ1wU6/1
I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)
I'm trying to convert a Notepad++ Regex to a PHP regular expression which basically get IDs from a list of URL in this format:
http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html
http://www.example.com/category-example/1471337-text-blah-blah-2-blah-2010.html
Using Notepad++ regex function i get the output that i need in two steps (a list of comma separated IDs)
(.*)/ replace with space
-(.*) replace with comma
Result:
1371937,1471337
I tried to do something similar with PHP preg_replace but i can't figure how to get the correct regex, the below example removes everything except digits but it doesn't work as expected since there can be also numbers that do not belong to ID.
$bb = preg_replace('/[^0-9]+/', ',', $_POST['Text']);
?>
Which is the correct structure?
Thanks
If you are matching against:
http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html
To get:
1371937
You would:
$url = "http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html";
preg_match( "/[^\d]+(\d+)-/", $url, $matches );
$code = $matches[1];
.. which matches all non-numeric characters, then an unbroken string of numbers, until it reaches a '-'
If all you want to do is find the ID, then you should use preg_match, not preg_replace.
You've got lost of options for the pattern, the simplest being:
$url = 'http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html';
preg_match('/\d+/', $url, $matches);
echo $matches[0];
Which simply finds the first bunch of numbers in the URL. This works for the examples.
I'm looking for a regex pattern that will return N slugs/chunks (all pieces of the URL, separated or split on the "/" char.) as matches from a "friendly" URL.
The pattern should not include the domain or a leading slash.
Also, the pattern should work with an unknown number of slugs and/or slashes.
For example, some example URLs and desired returned slugs/chunks:
"" = array()
"foo/bar/" = array('foo', 'bar')
"foo/bar/baz" = array('foo', 'bar', 'baz')
"foo-bar/baz" = array('foo-bar', 'baz')
Finally, I need to pass this regex pattern preg_match (or similar) and have it return the results via the function's $matches parameter.
For example:
<?php preg_match($your_pattern, $friendly_url, $your_pattern_matches); ?>
... similar results can be prduced using explode().
This pattern is being used in a much more complex scenario than my little old example; requiring the use/forcing me to use regex patterns via preg_match for the solution. Basically, I'm passing preg_match a pattern of choice, which is why I need a regex pattern as opposed to simply using explode.
Your help is GREATLY appreciated!
Cheers!
First of all, check the manual of preg_split
$segments = preg_split('[/]', $uri, 0, PREG_SPLIT_NO_EMPTY);
If you insist on preg_match take a look on this:
$uri = '/foo-bar/baz';
preg_match_all('%[^/]+%', $uri, $matches);
print_r($matches);
Sounds like explode() would do the job without having to bother with regexes:
$matches = explode('/', $url);
Sorry but I don't think you can do what you want with preg_match.
After reading the documentation
You can see that preg_match will stop at the first match. You want an array of the matches in a friendly url however this can only be achieved by multiple matches , in order to store the values in an array OR by a single match which would capture the whole thing. Both of these cases do not fit you so I am afraid that you would have to use something else than preg_match.
I need to take a url like this:
https://www.domain.com/m/281/[imagename].jpg
and turn it into this:
http://www.NEWdomain.com/images/[imagename].jpg
I will need to do this to many urls so I want to write a quick php script to put the urls in array and then loop to change the domain name and remove the file structure in the original urls. Not all the original urls are /m/281 some are slightly different.
I thought I could do a str_replace for the https://www.domain.com to http://www.NEWdomain.com, but I am stumped with how to change the varying /m/281/ in the url's to my file structure like /images/.
Would a regular expression be best to solve this problem?
you could try something like :
strip off the "https://"
do a str_replace() as you said on
the domain
split the string into an array based
on "/". explode("/", $urlString);
loop through and remove any elements
after the URL element but not the
last.
result will be:
$arr[0] = www.NEWdomain.com
$arr[1] = [imagename].jpg
then just insert before the last element "images"
result will then be:
$arr[0] = www.NEWdomain.com
$arr[1] = images
$arr[2] = [imagename].jpg
finally implode it back to a string:
$blah = implode("/", $arr);
Why don't you try using some URL parsing library like - parse_url
and then get each component and do simpler string replace perhaps.
If you want to change all image urls from all paths, this tested function should do the trick.
function fixurls($text) {
$re = '% # Match image urls in domain.com
https://www\.domain\.com/ # Required domain.
(?:[^\s/]+/)* # Optional pathname.
([^\s/]+\.jpe?g|png|gif) # $1: Filename (images only)
\b # Anchor to word boundary.
%xim';
// Fix all image URLs in $text string.
$replace = 'http://www.NEWdomain.com/images/$1';
$text = preg_replace($re, $replace, $text);
return $text;
}
You can easily modify the path portion of the regex if you only wish to change images from specific paths.
Your regular expression could match /[a-zA-Z]/[0-9]*/, if I didn't make a bad assumption about your old pattern.
I think what you need is preg_replace().
If only the first two subdirectory segments are variable, you could try:
$src = preg_replace(
"~https?://www.domain.com/\w+/\d+/(.*?\.jpg)~" // match regex
"http://www.NEWdomain.com/images/$1", // replacement
$src);
The \w means a letter, and \d+ matches decimals. The .*? works on almost anything, since you didn't give any criteria for the filename.
In the replacement string the $1 just becomes what was previously matched with the ( capture ) parens.