How to convert partially invalid JSON to a valid one? - php

I am using php to scrape a webpage and get this string:
'[{endTime:"2019-06-05T17:15:00.000+10:00",startTime:"2019-06-05T17:00:00.000+10:00"}]'
which is not valid json, the key names are encapsulated ...
I use preg_replace to create valid json:
$x = '[{endTime:"2019-06-05T17:15:00.000+10:00",startTime:"2019-06-05T17:00:00.000+10:00"}]'
$j = preg_replace('/(\w+)\s{0,1}:/', '"\1":', $x);
and get this value:
'[{"endTime":"2019-06-"05T17":"15":00.000+"10":00","startTime":"2019-06-"05T17":"00":00.000+"10":00"}]'
but I want this value:
'[{"endTime":"2019-06-05T17:15:00.000+10:00","startTime":"2019-06-05T17:00:00.000+10:00"}]'
How do I solve this problem?

RegEx 1
Your original expression seems to be find, we would just slightly modify that to:
([{,])(\w+)(\s+)?:
and it might work, we are adding a left boundary:
([{,])
and a right boundary:
:
and our key attribute is in this capturing group:
(\w+)
RegEx 2
We can expand our first expression to:
([{,])(\s+)?(\w+)(\s+)?:
in case, we might be having spaces before the key attribute:
Demo
Test 1
$re = '/([{,])(\w+)(\s+)?:/m';
$x = '[{endTime:"2019-06-05T17:15:00.000+10:00",startTime:"2019-06-05T17:00:00.000+10:00"}]';
$subst = '$1"$2":';
$result = preg_replace($re, $subst, $x);
echo $result;
Test 2
$re = '/([{,])(\s+)?(\w+)(\s+)?:/m';
$x = '[{endTime:"2019-06-05T17:15:00.000+10:00",startTime:"2019-06-05T17:00:00.000+10:00"}]';
$subst = '$1"$3":';
$result = preg_replace($re, $subst, $x);
echo $result;
Output
[{"endTime":"2019-06-05T17:15:00.000+10:00","startTime":"2019-06-05T17:00:00.000+10:00"}]
Demo
RegEx Circuit
jex.im visualizes regular expressions:

use this pattern :
([{,])([^:]+):
it will find all texts which are following by { or ,
and use this for replacement:
$1"$2":
It will add a doublequote on both sides of your word.

Related

How to remove 2 last characters with preg_replace?

I have a code like : 784XX . XX could be a character or number and I need an expression to remove the last 2 characters (XX) using ( and only ) preg_replace.
How can I do that?
For example, the output of :
782A3 is 782,
0012122 is 00121,
76542A is 7654,
333333CD is 333333,
You can use substr function.
But if you will use preg_replace you can do this:
$val = preg_replace('/[\w\d]{2}$/', '', $val);
I'm pretty sure there are much easier ways to do this task, yet if we wish to use regular expressions, we would be starting with just a simple expression such as:
(.+)?(..)
if I understand the problem right, and our desired output is in this capturing group:
(.+)
Demo
$re = '/(.+)?(..)/m';
$str = '782A3
0012122
76542A
333333CD';
$subst = '$1';
$result = preg_replace($re, $subst, $str);
echo $result;
RegEx Circuit
jex.im visualizes regular expressions:
Advice
AbraCadaver's advice in the comment is much better way:
substr('784XX', 0, -2);

RegEx for adding a space in a special pattern

Quick note: I know markdown parsers don't care about this issue. It's for the sake of visual consistency in the md file and also experimentation.
Sample:
# this
##that
###or this other
Goal: read each line and,if a markdown header does not have a space after the pound/hashtag sign, add one so that it would look like:
# this
## that
### or this other
My non-regex attempt:
function inelegantFunction (string $string){
$array = explode('#',$string);
$num = count($array);
$text = end($array);
return str_repeat('#', $num-1)." ".$text;
}
echo inelegantFunction("###or this other");
// returns ### or this other
This works, but it has no mechanism to match the unlikely case of seven '#'.
Regardless of efficacy, I would like to figure out how to do this with regex in php (and perhaps javascript if that matters).
Try to match (?m)^#++\K\S which matches lines starting with one or more number signs then replace it with $0 in your function:
return preg_replace('~(?m)^#++\K\S~', ' $0', $string);
See live demo here
To limit the number of #s to six use:
(?m)^(?!#{7})#++\K\S
I'm guessing that a simple expression with a right char-list boundary might be working here, maybe:
(#)([a-z])
If we might be having more chars, we can simply add it to [a-z].
Demo
Test
$re = '/(#)([a-z])/m';
$str = '#this
##that
###that
### or this other';
$subst = '$1 $2';
$result = preg_replace($re, $subst, $str);
echo "The result of the substitution is ".$result;

Regex replace recursive with one pattern

$array[key][key]...[key]
replace to
$array['key']['key']...['key']
I managed only to add quotes to the first keyword of the array.
\$([a-zA-Z0-9]+)\[([a-zA-Z_-]+[0-9]*)\] replace to \$\1\[\'\2\3\'\]
You may use a regex that does not perform a recursive, but consecutive matching:
$re = '/(\$\w+|(?!^)\G)\[([^]]*)\]/';
$str = "\$array[key][key][key]";
$subst = "$1['$2']";
$result = preg_replace($re, $subst, $str);
echo $result;
See IDEONE demo
The regex (\$\w+|(?!^)\G)\[([^]]*)\] matches all square parenthetical substrings (capturing their contents into Group 2) (with \[([^]]*)\]) that either are right after a '$'+alphanumerics substring (due to the \$\w+ part) or that follow one another consecutively (thanks to (?!^)\G).
Shouldn't need anything fancy, just get the stuff you need then
replace in a callback.
Untested:
$new_input = preg_replace_callback('/(?i)\$[a-z]+\K(?:\[[^\[\]]*\])+/',
function( $matches ){
return preg_replace( '/(\[)|(\])/', "$1'$2", $matches[0]);
},
$input );

Encode equal sign in query string with regex

I have a query string that may look like one of the following:
?key=aa=bb
?key=aa=bb=cc
?key=aa=bb&key2=cc
etc.
What I want to do is replace the equal sign in the value part only. So it should result in this:
?key=aa%3dbb
?key=aa%3dbb%3dcc
?key=aa%3dbb&key2=cc
I'm trying to do that with the following regex by using a look ahead. But it's not doing anything.
echo preg_replace("/=(?=[^&])=/", "%3d", 'http://www.example.com?key=aaa=bbb=ccc&key3=dddd');
Example code here
How can I make this work?
(\bkey\d*)=(*SKIP)(*F)|=
Try this.See demo.
https://regex101.com/r/hR7tH4/13
$re = "/(\\bkey\\d*)=(*SKIP)(*F)|=/m";
$str = "\n ?key=aa=bb\n ?key=aa=bb=cc\n ?key=aa=bb&key2=cc\n";
$subst = "%3d";
$result = preg_replace($re, $subst, $str);
You don't need regex, use the proper tools. parse_url() to get the query string (and whatever else you want), then parse_str() to get an array of the var/vals. Then http_build_query() will encode for you:
$query = parse_url('http://www.example.com?key=aaa=bbb=ccc&key3=dddd', PHP_URL_QUERY);
parse_str($query, $array);
$result = http_build_query($array);
Here is another version of a regex based on the same approach as vks':
[&?][^&=]+=(*SKIP)(*FAIL)|=
Regex explanation:
[&?] - Match & or ? literally
[^&=]+ - Match characters other than & and =
= - Match = (so, we matched a key)
(*SKIP)(*FAIL) - Verbs that fail the match at this point (we do not replace this = we found after key)
= - We match any other = and we'll remove it.
Here is IDEONE demo:
$re = "/[&?][^&=]+=(*SKIP)(*FAIL)|=/";
$str = "http://google.com?key=aa=bb\nhttp://google.com?key=aa=bb=cc\nhttp://google.com?key=aa=bb&key2=cc";
$result = preg_replace($re, "%3d", $str);
echo $result;
What about this?
preg_replace_callback("/=([^&$]+)/", "myReplace", "http://www.example.com?key=aaa=bbb=ccc&key3=dddd");
function myReplace($matches) {
return "=" . urlencode($matches[1]);
}
Just gonna add an addendum, explaining your specific attempt:
preg_replace("/=(?=[^&])=/",
↑ ↑
While the lookahead was a nice idea, it really just would match a single character. And in this case just would have asserted the very next character not to be &.
You could refashion it into:
preg_replace("/=([^&=]+)\K=/",
↑
Which I guess is what you tried. Note that this merely ignores every second =………= equal sign. So would only suit your simple example query strings, not more plentiful unescaped characters within.

Regular expressions replace

I need to remove
[0037][user name]
combination from a sentence. In the first brackets always containing numbers
eg:
[0032]
Digit count will not exceed than 4 by any chance. In the second brackets always containing letters eg:
[first name]
anyone have an idea how to do this?
You can use preg_replace() to implement regular expression syntax and try the following expression.
$str = preg_replace('/\[\d+]\[[a-z ]+]/i', '', $str);
\[\d{1,4}\]\[[a-zA-Z ]+\]
This should do it.Replace by empty string.See demo.
http://regex101.com/r/oE6jJ1/22
$re = "/\\[\\d{1,4}\\]\\[[a-zA-Z ]+\\]/im";
$str = "asdas asdsad [1234][asd asd] asdasd";
$subst = "";
$result = preg_replace($re, $subst, $str);

Categories