I am about to start working on a project that requires me to do the following:
load the source code of a webpage into a string with file_get_contents
find a certain substring in the first string, that reads "Your code: 6-digit-number-here, with a dash after every 2 digits"
save the first occurrence of the substring into a text file
do the same for each occurrence of the substring in the string
The 6-digit number is different for each occurrence in the source code. How do I define that number in the substring, so I can properly search for it, and how can I make it save every occurrence of the defined substring? Help would be greatly appreciated.
You could use regular expressions to match all codes. In this example the variable $matches would contain all matches from the html string:
$html = file_get_contents('[url]');
preg_match_all('/[0-9]{2}\-[0-9]{2}\-[0-9]{2}/', $html, $matches);
var_dump($matches);
suppose you have variable like
$username = "abcd#somedomain.com";
$username = some
try using this
$username = substr($username, 0, strpos($username, '#'));
Related
I have a text file that contains multiple occurrences of a substring - pin="*****" where * equals a number.
The pin number inside the enclose quotes can be N length.
What I would like to do is extract the line pin="*****" or even better just the number inside it and add it to an array for later processing.
An example of what the text file looks like is like so
[123456]
mystring="srthjkgnyjh"
pin="9898
anotherstring="jghksdfghjkdfg6788678345hjkfsd"
[654321]
mystring="hksfkhjsjl"
pin="4343434"
anotherstring="kdgig89794578945789jkhflsf7865"
etc..
Ideas are much appreciated
Thank you
You can use this:
preg_match_all('~(?<=\bpin=")[0-9]+(?=")~', $str, $matches);
$result = $matches[0];
(?<=..) and (?=..) are a lookbehind and a lookahead. You can find more informations about it here.
I have a string that contains variables in the format {namespace:name} I am trying to create a regular expression for finding all of the variables in the string. I have the following so far, but it isn't working:
$str = " {user:fname} and the last name = {user:lname}";
var_dump(preg_match_all("/^\{(\w):(\w)\}/", $str, $matches));
var_dump($matches);
But it isn't finding any of the tags. The variables can have any word for namespace and name, but letters only with no spaces. Any help would be appreciated.
Update
I tried the following also and received no results: "/\{(\w):(\w)\}/"
Remove the anchor ^ from the regex and allow variables with a length of more than one character.
/^\{(\w):(\w)\}/
becomes:
/\{(\w+):(\w+)\}/
I got the following URL
http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego
and I want to extract
B000NO9GT4
that is the asin...to now, I can get search between the string, but not in this way I require. I saw the split functin, I saw the explode. but cant find a way out...also, the urls will be different in length so I cant hardcode the length two..the only thing which make some sense in my mind is to split the string so that
http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/
become first part
and
B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego
becomes the 2nd part , from the second part , I should extract B000NO9GT4
in the same way, i would want to get product name LEGO-Ultimate-Building-Set-Pieces from the first part
I am very bad at regex and cant find a way out..
can somebody guide me how I can do it in php?
thanks
This grabs both pieces of information that you are looking to capture:
$url = 'http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego';
$path = parse_url($url, PHP_URL_PATH);
if (preg_match('#^/([^/]+)/dp/([^/]+)/#i', $path, $matches)) {
echo "Description = {$matches[1]}<br />"
."ASIN = {$matches[2]}<br />";
}
Output:
Description = LEGO-Ultimate-Building-Set-Pieces
ASIN = B000NO9GT4
Short Explanation:
Any expressions enclosed in ( ) will be saved as a capture group. This is how we get at the data in $matches[1] and $matches[2].
The expression ([^/]+) says to match all characters EXCEPT / so in effect it captures everything in the URL between the two / separators. I use this pattern twice. The [ ] actually defines the character class which was /, the ^ in this case negates it so instead of matching / it matches everything BUT /. Another example is [a-f0-9] which would say to match the characters a,b,c,d,e,f and the numbers 0,1,2,3,4,5,6,7,8,9. [^a-f0-9] would be the opposite.
# is used as the delimiter for the expression
^ following the delimiter means match from the beginning of the string.
See www.regular-expressions.info and PCRE Pattern Syntax for more info on how regexps work.
You can try
$str = "http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego" ;
list(,$desc,,$num,) = explode("/",parse_url($str,PHP_URL_PATH));
var_dump($desc,$num);
Output
string 'LEGO-Ultimate-Building-Set-Pieces' (length=33)
string 'B000NO9GT4' (length=10)
I'm trying to convert a Notepad++ Regex to a PHP regular expression which basically get IDs from a list of URL in this format:
http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html
http://www.example.com/category-example/1471337-text-blah-blah-2-blah-2010.html
Using Notepad++ regex function i get the output that i need in two steps (a list of comma separated IDs)
(.*)/ replace with space
-(.*) replace with comma
Result:
1371937,1471337
I tried to do something similar with PHP preg_replace but i can't figure how to get the correct regex, the below example removes everything except digits but it doesn't work as expected since there can be also numbers that do not belong to ID.
$bb = preg_replace('/[^0-9]+/', ',', $_POST['Text']);
?>
Which is the correct structure?
Thanks
If you are matching against:
http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html
To get:
1371937
You would:
$url = "http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html";
preg_match( "/[^\d]+(\d+)-/", $url, $matches );
$code = $matches[1];
.. which matches all non-numeric characters, then an unbroken string of numbers, until it reaches a '-'
If all you want to do is find the ID, then you should use preg_match, not preg_replace.
You've got lost of options for the pattern, the simplest being:
$url = 'http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html';
preg_match('/\d+/', $url, $matches);
echo $matches[0];
Which simply finds the first bunch of numbers in the URL. This works for the examples.
I have a URL that is in the following structure: http://somewebsite.com/directory1/directory2/directory3...
I'm trying to get the last directory name from this url, but the depth of the url isn't always constant so i don't think i can use a simple substr or preg_match call - is there a function to get the last instance of a regular expression match from a string?
Just use:
basename( $url )
It should have the desired effect
Torben's answer is the correct way to handle this specific case. But for posterity, here is how you get the last instance of a regular expression match:
preg_match_all('/pattern/', 'subject', $matches, PREG_SET_ORDER);
$last_match = end($matches); // or array_pop(), but it modifies the array
$last_match[0] contains the complete match, $last_match[1] contains the first parenthesized subpattern, etc.
Another point of interest: your regular expression '/\/([^/])$/' should work as-is because the $ anchors it to the end.