RegEx in PHP to extract components of nquad - php

I'm looking around for a RegEx that can help me parse an nquad file. An nquad file is a straight text file where each line represents a quad (s, p, o, c):
<http://mysubject> <http://mypredicate> <http://myobject> <http://mycontext> .
<http://mysubject> <http://mypredicate2> <http://myobject2> <http://mycontext> .
<http://mysubject> <http://mypredicate2> <http://myobject2> <http://mycontext> .
The objects can also be literals (instead of uris), in which case they are enclosed with double quotes:
<http://mysubject> <http://mypredicate> "My object" <http://mycontext> .
I'm looking for a regex that given one line of this file, which will give me back a php array in the following format:
[0] => "http://mysubject"
[1] => "http://mypredicate"
[2] => "http://myobject"
[3] => "http://mycontext"
...or in the case where the double quotes are used for the object:
[0] => "http://mysubject"
[1] => "http://mypredicate"
[2] => "My Object"
[3] => "http://mycontext"
One final thing - in an ideal world, the regex will cater for the scenario there may be 1 or more spaces between the various components, e.g.
<http://mysubject> <http://mypredicate> "My object" <http://mycontext> .

I'm going to add another answer as an additional solution using only a regex and explode:
$line = "<http://mysubject> <http://mypredicate> <http://myobject> <http://mycontext>";
$line2 = '<http://mysubject> <http://mypredicate> "My object" <http://mycontext>';
$delimeter = '---'; // Can't use space
$result = preg_replace('/<([^>]*)>\s+<([^>]*)>\s+(?:["<]){1}([^">]*)(?:[">]){1}\s+<([^>]*)>/i', '$1' . $delimeter . '$2' . $delimeter . '$3' . $delimeter . '$4', $line);
$array = explode( $delimeter, $result);

It seems this can be accomplished as follows (I do not know your character restrictions so it may not work specifically for your needs, but worked for your test cases):
$line = "<http://mysubject> <http://mypredicate> <http://myobject> <http://mycontext>";
$line2 = '<http://mysubject> <http://mypredicate> "My object" <http://mycontext>';
// Remove unnecessary whitespace between entries (change $line to $line2 for testing)
$delimeter = '---';
$result = preg_replace('/([">]){1}\s+(["<]){1}/i', '$1' . $delimeter . '$2', $line);
// Explode on our delimeter
$array = explode( $delimeter, $result);
foreach( $array as &$a)
{
// Replace the characters we don't want with nothing
$a = str_replace( array( '<', '.', '>', '"'), '', $a);
}
var_dump( $array);

This regular expression would help:
/(\S+?)\s+(\S+?)\s+(\S+?)\s+(\S+?)\s+\./
(s, p, o, c) values will be in $1, $2, $3, $4 variables.

Related

array_unique does not work on arabic alphabet strings

am trying to remove duplicates values from Arabic string input but
`$input = "اللہ";
$step1 = preg_split("/(?<!^)(?!$)/u", $input);
$step2 = implode(' ',step1);
$step3 = array_unique(step2 );
echo "step3";`
i need output like this
ا ل ھ
Can you try this?
$input = "اللہ";
$step1 = preg_split("/(?<!^)(?!$)/u", $input);
$step2 = implode(' ',$step1);
$step3 = str_split($step2);
$step4 = array_unique($step3);
$result = implode('', $step4);
echo "$result";
Instead of running a regular expression, you could use the multibyte function mb_str_split() from the mb_string library. It is made to handle special charsets. Your PHP config might already enable it to replace usual PHP functions, but in this case you should check your INI file, or you can add some ini_set() calls at the beginning of your code. But the easiest is just to call the mb_* functions.
Secondly, as #CBroe pointed out, you made a few mistakes in your code by passing a string to a function that wants an array as parameter.
This is what you can do:
<?php
header('Content-Type: text/plain; charset=UTF-8');
$input = "اللہ";
$step1 = mb_str_split($input, 1, 'UTF-8');
print 'mb_str_split() returns ' . var_export($step1, true) . PHP_EOL;
$step2 = array_unique($step1);
print 'array_unique() returns ' . var_export($step2, true) . PHP_EOL;
print 'Desired output string is "' . implode(' ', $step2) . '"' . PHP_EOL;
Output:
mb_str_split() returns array (
0 => 'ا',
1 => 'ل',
2 => 'ل',
3 => 'ہ',
)
array_unique() returns array (
0 => 'ا',
1 => 'ل',
3 => 'ہ',
)
Desired output string is "ا ل ہ"
You can run it here: https://onlinephp.io/c/fe990

PHP - Sanitize string removes numbers

I am attempting to define allowed characters in an array, and then sanitize strings based on this array. The below code works pretty good except that it removes chars 0-9 too!
Could someone please explain why this is?
Code:
<?php
//Allowed characters within user data:
$symbols = array();
$symbols += range('a', 'z');
$symbols += range('A', 'Z');
$symbols += range('0', '9');
array_push($symbols,' ','-'); // Allow spaces and hyphens.
//----test 1
//data to test.
$someString = "07mm04dd1776yyyy";
//sanatize
$someString = trim(preg_replace("/[^" . preg_quote(implode('',$symbols), '/') . "]/i", "", $someString));
echo "$someString\n";
//----test 2
$someString = "Another-07/04/1776-test-!##$%^&*()[]\\;',./\"[]|;\"<>?";
//sanatize
$someString = trim(preg_replace("/[^" . preg_quote(implode('',$symbols), '/') . "]/i", "", $someString));
echo "$someString\n";
?>
Output:
mmddyyyy
Another--test-
Sidenote (edit): This is used in conjunction with a database but it goes beyond the DB, the data in the DB is used to write powershell scripts which import users into Active Directory, and many characters are not allowed, plus the old system only allowed these characters also.
Thank you in advance,
Wayne
Going off of what #andrewsi said with the allowed chars not being added to the array, I figured out how to add them properly. The below code shows they are added, and the outputs of the test strings.
There's probably a better way to do this, so I added it to the community wiki.
<?php
//Allowed characters within user data:
$symbols = array();
array_push($symbols,implode("",range('0', '9')));
array_push($symbols,implode("",range('a', 'z')));
array_push($symbols,implode("",range('A', 'Z')));
array_push($symbols,' ','-'); // Allow spaces and hyphens.
print_r($symbols);
echo "\n";
//----test 1
//data to test.
$someString = "07mm04dd1776yyyy";
//sanatize
$someString = trim(preg_replace("/[^" . preg_quote(implode('',$symbols), '/') . "]/", "", $someString));
echo "$someString\n";
//----test 2
$someString = "Another-07/04/1776-test-!##$%^&*()[]\\;',./\"[]|;\"<>?";
//sanatize
$someString = trim(preg_replace("/[^" . preg_quote(implode('',$symbols), '/') . "]/", "", $someString));
echo "$someString\n";
?>
Output:
Array
(
[0] => 0123456789
[1] => abcdefghijklmnopqrstuvwxyz
[2] => ABCDEFGHIJKLMNOPQRSTUVWXYZ
[3] =>
[4] => -
)
07mm04dd1776yyyy
Another-07041776-test-

Make bold specific part of string

I have an array like
$array[]="This is a test";
$array[]="This is a TEST";
$array[]="TeSt this";
I need to make the string 'test' as bold like
$array[]="This is a <b>test</b>";
$array[]="This is a <b>TEST</b>";
$array[]="<b>TeSt</b> this";
I have tried with str_replace() but it is case sensitive,
Note:
I need to make the given string bold and keep as it is.
You can use array_walk PHP function to replace the string value within an array. Check below code
function my_str_replace(&$item){
$item = preg_replace("/test/i", '<b>$0</b>', $item);
}
$array[]="This is a test";
$array[]="This is a TEST";
$array[]="TeSt this";
array_walk($array, 'my_str_replace');
EDIT: Based on John WH Smith's comment
You can simply use $array = preg_replace("/test/i", '<b>$0</b>', $array); which would do the magic
If you're looking for patterns instead of fixed strings like "test", have a look at REGEXes and preg_replace :
$str = preg_replace("#(test|otherword)#i", "<b>$1</b>", $str);
More about REGEXes :
http://en.wikipedia.org/wiki/Regular_expression
http://www.regular-expressions.info/
http://uk.php.net/preg_replace
Edit : added "i" after the REGEX to remove case sensitivity.
You can use a function like the one I wrote below:
function wrap_text_with_tags( $haystack, $needle , $beginning_tag, $end_tag ) {
$needle_start = stripos($haystack, $needle);
$needle_end = $needle_start + strlen($needle);
$return_string = substr($haystack, 0, $needle_start) . $beginning_tag . $needle . $end_tag . substr($haystack, $needle_end);
return $return_string;
}
So you'd be able to call it as follows:
$original_string = 'Writing PHP code can be fun!';
$return_string = wrap_text_with_tags( $original_string , 'PHP' , "<strong>" ,"</strong>");
When returned the strings will look as follows:
Original String
Writing PHP code can be fun!
Modified Result
Writing PHP code can be fun!
This function only works on the FIRST instance of a string.
This is my solution. It also keeps all uppercase letters uppercase and all lowercase letters lowercase.
function wrapTextWithTags( $haystack, $needle , $tag ): string
{
$lowerHaystack = strtolower($haystack);
$lowerNeedle = strtolower($needle);
$start = stripos($lowerHaystack, $lowerNeedle);
$length = strlen($needle);
$textPart = substr($haystack, $start, $length);
$boldPart = "<" . $tag . ">" . $textPart . "</" . $tag . ">";
return str_replace($textPart, $boldPart, $haystack);
}
I find using a preg_replace() call to be the most appropriate tool for this task because:
it can affect all elements in the array without writing a loop,
it can replace more than one substring within a string,
adding a case-insensitive flag (i) is an easy and intuitive adjustment,
adding word boundaries (/b) on either side of the "needle" word will ensure that only whole words are replaced
when replacing the fullstring match, no parentheses / capture groups are necessary.
Code: (Demo)
$array = [
"This is a test",
"This is a TEST",
"Test this testy contest protest test!",
"TeSt this",
];
var_export(
preg_replace('/\btest\b/i', '<b>$0</b>', $array)
);
Output:
array (
0 => 'This is a <b>test</b>',
1 => 'This is a <b>TEST</b>',
2 => '<b>Test</b> this testy contest protest <b>test</b>!',
3 => '<b>TeSt</b> this',
)
Try str_ireplace. Case insensitive version of str_replace
Try this
Using str_ireplace
str_ireplace("test", "<b>test</b>", $array);
str_ireplace("TSET", "<b>TEST</b>", $array);

Strings and Arrays not working like I thought

I'm trying to learn more about strings and arrays. I have this bit of code:
<?php
$states = "OH, VA, GA";
$arrayStates = explode(",", $states);
$exists = "GA";
print_r($arrayStates);
if (in_array($exists, $arrayStates)){
echo "<br/>" . $exists . " " . "exists.";
} else {
echo "<br/>" . $exists . " " . "doesn't exist.";
}
?>
According to my feeble mind, GA should exist in the array. If I put $exists = "OH", that works. But the screen is showing this:
Array ( [0] => OH [1] => VA [2] => GA )
GA doesn't exist.
What am I not understanding here?
The array contains the string " GA" with a space as the first character. That's not equal to `"GA", which goesn't have a space.
You should either use explode(", "), $states) or call trim() on each element of the array:
$arrayStates = array_map('trim', explode(",", $states));
You need to explode with a space after the comma.
$arrayStates = explode(", ", $states);
you're splitting with , but your text has spaces, so after split you have:
Array ( [0] => OH [1] => _VA [2] => _GA )
you can either split by ,_ (replace underscore with space)
or you can trim all values after split, like:
foreach ($arrayStates as $k => $v) $arrayStates[$k] = trim($v);
That is because it is being divided by , so your array contents are :
Array
(
[0] => OH
[1] => VA
[2] => GA
)
you need to do $arrayStates = explode(", ", $states);
In $arrayStates after applying explode(...) you have:
$arrayStates[0] stores "OH"
$arrayStates[1] stores " VA"
$arrayStates[2] stores " GA"
Note at index 2 the array is storing " GA" (note the space) instead of "GA" that is because in the explode function you are using ",". To get your code working as you want you should use in the explode function ", " (note the space)
The explode method splits the string on the comma "," ONLY and does not remove the whitespace. As a result you end up comparing "GA" (your $exists) to " GA" (inside of the array, note the whitespace) =]

Create Array out of a plain text variable?

I am trying to create an array from a plain text variable like so in php:
$arraycontent = "'foo', 'bar', 'hallo', 'world'";
print_r( array($arraycontent) );
But it outputs the entire string as [0]
Array ( [0] => 'foo', 'bar', 'hallo', 'world' )
I would like 'foo' to be [0]
bar to be [1] and so on. Any pointers? Is this even possible?
YIKES, why are these all so long?
Here's a one liner:
explode("', '", trim($arraycontent, "'"));
If your string was like this:
$arraycontent = "foo, bar, hallo, world"
With only the commas separating, then you could use explode, like this:
$myArray = explode(", ", $arraycontent);
This will create an array of strings based on the separator you define, in this case ", ".
If you want to keep the string as is, you can use this:
$myArray = explode("', '", trim($arraycontent, "'"));
This will now use "', '" as the separator, and the trim() function removes the ' from the beginning and end of the string.
If this is PHP you could use:
$foo = "'foo', 'bar', 'hallo', 'world'";
function arrayFromString($string){
$sA = str_split($string); array_shift($sA); array_pop($sA);
return explode("', '", implode('', $sA));
}
print_r(arrayFromString($foo));
eval('$array = array('.$arraycontent.');');
Would be the shortest way.
$array = explode(',', $arraycontent);
$mytrim = function($string) { return trim($string, " '"); };
$array = array_map($mytrim, $array);
A safer and therefore better one. If you have different whitespace characters you would have to edit the $mytrim lambda-function.
Here is a variant that based on the input example would work, but there might be corner cases that is not handled as wanted.
$arraycontent = "'foo', 'bar', 'hallo', 'world'";
$arrayparts = explode(',', $arraycontent); // split to "'foo'", " 'bar'", " 'hallo'", " 'world'"
for each ($arrayparts as $k => $v) $arrayparts[$k] = trim($v, " '"); // remove single qoute and spaces in beggning and end.
print_r( $arrayparts ); // Array ( [0] => 'foo', [1] => 'bar', [2] => 'hallo', [3] => 'world' )
This should give what you want, but also note that for example
$arraycontent = " ' foo ' , ' bar ' ' ' ' ', 'hallo', 'world'";
Would give the same output, so the question then becomes how strict are the $arraycontentinput?
If you have to have input like this, I suggest trimming it and using preg_split().
$arraycontent = "'foo', 'bar', 'hallo', 'world'";
$trimmed = trim($arraycontent, " \t\n\r\0\x0B'\""); // Trim whitespaces and " and '
$result = preg_split("#['\"]\s*,\s*['\"]#", $trimmed); // Split that string with regex!
print_r($result); // Yeah, baby!
EDIT: Also I might add that my solution is significantly faster (and more universal) than the others'.
That universality resides in:
It can recognize both " and ' as correct quotes and
It ignores the extra spaces before, in and after quoted text; not inside of it.
Seems you are having problem while creating an array..
try using
<?php
$arraycontent = array('foo', 'bar', 'hallo', 'world');
print_r( array($arraycontent) );
?>
its output will be:
Array ( [0] => Array ( [0] => foo [1] => bar [2] => hallo [3] => world
) )

Categories