Extract multiple lines of text from string PHP - php

I have a string of registry keys that looks like this:
ThreatName REG_SZ c:\temp Protection Code REG_SZ a Check ThreatName REG_SZ c:\windows Protection
I want to extract "c:\WHATEVER" from the string. It occurs multiple times between the words "ThreatName REG_SZ" and "Protection".
How can I extract "c:\WHATEVER" multiple times using PHP?

One way to do it is using Regular Expressions, here is an example code (un-tested).
$string = "ThreatName REG_SZ c:\\temp Protection
Code REG_SZ c:\\a Check
ThreatName REG_SZ c:\\windows Protection";
preg_match_all("~.* REG_SZ (.*) ~iU", $string, $matches);
print_r($matches);
If you want to understand more see the php manual for: preg_match_all(). Or google regular expressions for more information on them. But basically it looks between the REG_SZ and Protection (the U modifier makes it ungreedy so it will look for the first Protection) and returns everything but the new line character (the .*). If this is spread across new lines, the 's' modifier will help resolve that.
EDIT: Saw that you wanted them all. This should work for all of them.
EDIT: Fixed the regex to include "ThreatName", not sure if this is dynamic. Also added extra slashes to the string as they were being parsed as characters.
I am not sure if you will have to use addslashes() on the string or not, but it maybe needed.
Removed the isset as it was not necessary.
EDIT: Modified the code given that the correct output formatting was omitted. The updated method will work, but if the directory has a space in it, chances are it will only pull the first part of the directory.

Related

Using wild cards in str_replace URL

i'm trying to use str_replace to edit urls in a csv import in Wordpress, using WP All Import.
This code works
[str_replace("https://oldsite.com/wp-content/uploads/2022/10/Image-Download.jpg", "http://newsite.com/wp-content/uploads/", {title_slider[1]})]
The problem is that not all the uploads in the old site rest in 2022/10 ... so i was wondering if there was any way to use a wildcard to replace 2022 for any year and 10 for any month ?
I tried uploads/*
I hoped that it might accept that but what is being produced is a mixed URL of both oldsite and newsite.
I know this would not work in a browser to navigate to the file, but i only require str_replace.
Current outcome https://oldsite.com/newsite.com/wp-content/uploads/2022/10/Image.jpg
Looking up the documentation for that plugin, confirms that the syntax you're using is running an ordinary PHP function.
So to find the direct answer to your question, we can look up the PHP manual for the str_replace function. It's rather straight-forward:
This function returns a string or an array with all occurrences of search in subject replaced with the given replace value.
No special syntax for wildcards or pattern matching. However, that page does link a couple of times to another function, preg_replace, which:
Searches subject for matches to pattern and replaces them with replacement.
That sounds more promising, right? Specifically, it uses regular expressions, which are a common and very powerful way to specify text patterns.
If you haven't heard of "regular expressions" before, the details might be confusing, but there are lots of tutorials and cheat sheets online.
For example, to match "exactly two digits" you can write \d\d, or \d{2}. In PHP, you would write that like this - note the special / or # delimiters around the pattern, inside the quotes; they can be pretty much any character, but if it appears in the pattern, it has to be "escaped", so best to choose something that doesn't appear:
$new_value = preg_replace('/\d\d/', '99', $old_value);
// or
$new_value = preg_replace("/\d\d/", "99", $old_value);
// or
$new_value = preg_replace('#\d\d#', '99', $old_value);
// or
$new_value = preg_replace("#\d\d#", "99", $old_value);
Translating that example into the plugin's syntax, and noting this from the plugin manual:
You must use double quotes when executing PHP functions from within WP All Import.
You would have this:
[preg_replace("/\d\d/", "99", {something[1]})]
// or
[preg_replace("#\d\d#", "99", {something[1]})]
Hopefully you can work out the rest from there.

PHP Regex URL parsing issues preg_replace

I have a custom markup parsing function that has been working very well for many years. I recently discovered a bug that I hadn't noticed before and I haven't been able to fix it. If anyone can help me with this that'd be awesome. So I have a custom built forum and text based MMORPG and every input is sanitized and parsed for bbcode like markup. It'll also parse out URL's and make them into legit links that go to an exit page with a disclaimer that you're leaving the site... So the issue that I'm having is that when I user posts multiple URL's in a text box (let's say \n delimited) it'll only convert every other URL into a link. Here's the parser for URL's:
$markup = preg_replace("/(^|[^=\"\/])\b((\w+:\/\/|www\.)[^\s<]+)" . "((\W+|\b)([\s<]|$))/ei", '"$1".shortURL("$2")."$4"', $markup);
As you can see it calls a PHP function, but that's not the issue here. Then entire text block is passed into this preg_replace at the same time rather than line by line or any other means.
If there's a simpler way of writing this preg_replace, please let me know
If you can figure out why this is only parsing every other URL, that's my ultimate goal here
Example INPUT:
http://skylnk.co/tRRTnb
http://skylnk.co/hkIJBT
http://skylnk.co/vUMGQo
http://skylnk.co/USOLfW
http://skylnk.co/BPlaJl
http://skylnk.co/tqcPbL
http://skylnk.co/jJTjRs
http://skylnk.co/itmhJs
http://skylnk.co/llUBAR
http://skylnk.co/XDJZxD
Example OUTPUT:
http://skylnk.co/tRRTnb
<br>http://skylnk.co/hkIJBT
<br>http://skylnk.co/vUMGQo
<br>http://skylnk.co/USOLfW
<br>http://skylnk.co/BPlaJl
<br>http://skylnk.co/tqcPbL
<br>http://skylnk.co/jJTjRs
<br>http://skylnk.co/itmhJs
<br>http://skylnk.co/llUBAR
<br>http://skylnk.co/XDJZxD
<br>
e flag in preg_replace is deprecated. You can use preg_replace_callback to access the same functionality.
i flag is useless here, since \w already matches both upper case and lower case, and there is no backreference in your pattern.
I set m flag, which makes the ^ and $ matches the beginning and the end of a line, rather than the beginning and the end of the entire string. This should fix your weird problem of matching every other line.
I also make some of the groups non-capturing (?:pattern) - since the bigger capturing groups have captured the text already.
The code below is not tested. I only tested the regex on regex tester.
preg_replace_callback(
"/(^|[^=\"\/])\b((?:\w+:\/\/|www\.)[^\s<]+)((?:\W+|\b)(?:[\s<]|$))/m",
function ($m) {
return "$m[1]".shortURL($m[2])."$m[3]";
},
$markup
);

Matching all three kinds of PHP comments with a regular expression

I need to match all three types of comments that PHP might have:
# Single line comment
// Single line comment
/* Multi-line comments */
 
/**
* And all of its possible variations
*/
Something I should mention: I am doing this in order to be able to recognize if a PHP closing tag (?>) is inside a comment or not. If it is then ignore it, and if not then make it count as one. This is going to be used inside an XML document in order to improve Sublime Text's recognition of the closing tag (because it's driving me nuts!). I tried to achieve this a couple of hours, but I wasn't able. How can I translate for it to work with XML?
So if you could also include the if-then-else login I would really appreciate it. BTW, I really need it to be in pure regular expression expression, no language features or anything. :)
Like Eicon reminded me, I need all of them to be able to match at the start of the line, or at the end of a piece of code, so I also need the following with all of them:
<?php
echo 'something'; # this is a comment
?>
Parsing a programming language seems too much for regexes to do. You should probably look for a PHP parser.
But these would be the regexes you are looking for. I assume for all of them that you use the DOTALL or SINGLELINE option (although the first two would work without it as well):
~#[^\r\n]*~
~//[^\r\n]*~
~/\*.*?\*/~s
Note that any of these will cause problems, if the comment-delimiting characters appear in a string or somewhere else, where they do not actually open a comment.
You can also combine all of these into one regex:
~(?:#|//)[^\r\n]*|/\*.*?\*/~s
If you use some tool or language that does not require delimiters (like Java or C#), remove those ~. In this case you will also have to apply the DOTALL option differently. But without knowing where you are going to use this, I cannot tell you how.
If you cannot/do not want to set the DOTALL option, this would be equivalent (I also left out the delimiters to give an example):
(?:#|//)[^\r\n]*|/\*[\s\S]*?\*/
See here for a working demo.
Now if you also want to capture the contents of the comments in a group, then you could do this
(?|(?:#|//)([^\r\n]*)|/\*([\s\S]*?)\*/)
Regardless of the type of comment, the comments content (without the syntax delimiters) will be found in capture 1.
Another working demo.
Single-line comments
singleLineComment = /'[^']*'|"[^"]*"|((?:#|\/\/).*$)/gm
With this regex you have to replace (or remove) everything that was captured by ((?:#|\/\/).*$). This regex will ignore contents of strings that would look like comments (e.g. $x = "You are the #1"; or $y = "You can start comments with // or # in PHP, but I'm a code string";)
Multiline comments
multilineComment = /^\s*\/\*\*?[^!][.\s\t\S\n\r]*?\*\//gm

Fetch All URLs from a Page using Regex

Original format:
<a href="http://www.example.com/t434234.html" ...>
1. I need to fetch all URLs of this format:
http://www.example.com/t[ANY CHARACTER].html
ANY CHARACTER is where value changes from URL to another. The rest are fixed.
Here is my attempt:
preg_match("#http:\/\/www\.aqarcity\.com\/t[a-zA-Z0-9_]\.html#", $page, $urls);
I get empty results. I don't know where i went wrong...
The problem appears to be that [a-zA-Z0-9_] will only match exactly one character. If you want to match zero or more characters, use [a-zA-Z0-9_]*. For one or more, use [a-zA-Z0-9_]+. For exactly six characters, use [a-zA-Z0-9_]{6}. For e.g. one to six characters, use [a-zA-Z0-9_]{1,6}.
Also note that, since you're using # as the delimiter, you don't need to escape the / characters. As far as I know this will not make your code misbehave, but it'll be easier to read if you remove the backslashes before the slashes.
Finally, please realize that regular expressions are a rather dangerous way to work with HTML. In this case, you may pick up matching URLs from comments, Javascript code, and other things that aren't links. It is literally impossible to correctly parse HTML with unaugmented regular expressions—they don't have the expressive power necessary to do so. I don't know what sorts of HTML parsers are available for PHP, but you may want to look into them.

PHP regex for filtering out urls from specific domains for use in a vBulletin plug-in

I'm trying to put together a plug-in for vBulletin to filter out links to filesharing sites. But, as I'm sure you often hear, I'm a newb to php let alone regexes.
Basically, I'm trying to put together a regex and use a preg_replace to find any urls that are from these domains and replace the entire link with a message that they aren't allowed. I'd want it to find the link whether it's hyperlinked, posted as plain text, or enclosed in [CODE] bb tags.
As for regex, I would need it to find URLS with the following, I think:
Starts with http or an anchor tag. I believe that the URLS in [CODE] tags could be processed the same as the plain text URLS and it's fine if the replacement ends up inside the [CODE] tag afterward.
Could contain any number of any characters before the domain/word
Has the domain somewhere in the middle
Could contain any number of any characters after the domain
Ends with a number of extentions such as (html|htm|rar|zip|001) or in a closing anchor tag.
I have a feeling that it's numbers 2 and 4 that are tripping me up (if not much more). I found a similar question on here and tried to pick apart the code a bit (even though I didn't really understand it). I now have this which I thought might work, but it doesn't:
<?php
$filterthese = array('domain1', 'domain2', 'domain3');
$replacement = 'LINKS HAVE BEEN FILTERED MESSAGE';
$regex = array('!^http+([a-z0-9-]+\.)*$filterthese+([a-z0-9-]+\.)*(html|htm|rar|zip|001)$!',
'!^<a+([a-z0-9-]+\.)*$filterthese+([a-z0-9-]+\.)*</a>$!');
$this->post['message'] = preg_replace($regex, $replacement, $this->post['message']);
?>
I have a feeling that I'm way off base here, and I admit that I don't fully understand php let alone regexes. I'm open to any suggestions on how to do this better, how to just make it work, or links to RTM (though I've read up a bit and I'm going to continue).
Thanks.
You can use parse_url on the URLs and look into the hashmap it returns. That allows you to filter for domains or even finer-grained control.
I think you can avoid the overhead of this in using the filter_var built-in function.
You may use this feature since PHP 5.2.0.
$good_url = filter_var( filter_var( $raw_url, FILTER_SANITIZE_URL), FILTER_VALIDATE_URL);
Hmm, my first guess: You put $filterthese directly inside a single-quoted string. That single quotes don't allow for variable substitution. Also, the $filterthese is an array, that should first be joined:
var $filterthese = implode("|", $filterthese);
Maybe I'm way off, because I don't know anything about vBulletin plugins and their embedded magic, but that points seem worth a check to me.
Edit: OK, on re-checking your provided source, I think the regexp line should read like this:
$regex = '!(?#
possible "a" tag [start]: )(<a[^>]+href=["\']?)?(?#
offending link: )https?://(?#
possible subdomains: )(([a-z0-9-]+\.)*\.)?(?#
domains to block: )('.implode("|", $filterthese).')(?#
possible path: )(/[^ "\'>]*)?(?#
possible "a" tag [end]: )(["\']?[^>]*>)?!';

Categories