Splitting up a string with no common delimiter - php

I am trying to split up a string like this:
22/9 14 ALTERNATE (16) myXMG fe (2)infernoHitbox Arena
I want to get 22/9 14 then ALTERNATE then (16) etc. But also need to split (2) inferno and Hitbox Arena. Problem is theres no common delimter across the string to get all these out. Also, I will need to split other strings in a similar fashion. Same order of information, just different content..
I'm struggling for ideas. Any help would be appreciated!

This works only with a regular Expression. You could work with substr but that only works if the positions are the same.
So you should have a look at preg_match.

If all the data you got has the same Character indexes.
You should try this method
http://php.net/manual/en/function.substr.php
And also found one here with a similar question
separate string in two by given position

Related

Regular Expression in Serialized Data

I am looking to a database search on serialized data. I am currently using Symfony2 as my Framework making pdo_mysql calls using Doctrine 2. What I would like to do is create a query that uses REGEXP to find data within a certian part of the array. The data I am trying to search within looks like this: -
a:1:{s:8:"bedrooms";a:5:{i:0;i:1;i:1;i:2;i:2;i:3;i:3;i:4;i:4;s:2:"5+";}}
So let's say I am looking for a record that has 3 bedrooms, then I would want it to find: -
i:2;i:3
The query I have come up with so far is: -
SELECT * FROM table WHERE field_name REGEXP '.*"bedrooms"; a:[0-9]+:{i:[0-9]+;i:3;}.*';
However this doesn't work. Can someone help me find a fix around this please? I think it's down to the way the regular expression is written.
Also its worth noting that there are other arrays stored in the field such credit limits and other data.
Thank you in advance.
I believe you can do it with the help of negated character class [^{}] that matches any character but a { and }:
.*"bedrooms";a:[0-9]+:[{][^{}]*i:[0-9]+;i:3[^{}]*[}]
See the regex demo
I see at least 2 mistakes and improvements you can do
first, in regex drop the blank space after "bedrooms";
you should scape the curly braces like \{ and \} since they are not literal for regex engine
if you are interested in a specific chunk in the string you must specify it as a group and inform what kind of characters are around, like
"bedrooms";a:[0-9]+:\{.*(i:[0-9];i:3).*\}
In this case in looking for i:*:i:3 where * is any digit

String Compression Class

I am trying to make a String compression system that could compress string with often used word in it.
But i have no idea on how i could make the logic work.
I was thinking of replacing world that apear often by a simple <1> and put that word in a array so that when we a reading the string we can see that <1> should be the first word in the array or some what.
But that is not my problem at the current moment.
Im trying to figure out how i could actually calculate how many time this word is appearing.
and i can't really use an explode(' ',$str); and check how many time it is there since i would like to check not only world but everything such as if there is allways a space between two world i would like to have them to store in my array also.
All of that in the idea of compressing a string.
Im am not looking for code tho, Im am simply trying to find a good logic i could make this work
Any one have an idea of how i could achieve that.
Thanks for any comment/awnser
I think the only way to do this is a sliding window... Hopefully you are using small strings :)
So, let's say your string was.
"Joey Novak Needs More Reputation :)"
We start with a 10 character string, and search the string for other instances of that string. So the first 10 character string would be "Joey Novak", Then we search the remainder of the string for that string. If we find one, awesome! We replace it with the marker (<1> works.) and search again, if we don't, we move on to the next string, which would be "oey Novak " and do the same, etc... When we finish with all the 10 character strings, we move on to 9 character, and work our way down. Since the marker is 3 characters long, you only need to go to 4 character strings.
Joey

Splitting a URL

I'm trying to split the following URL format: https://www.facebook.com/media/set/?set=a.10150495063500716.644503.10150093906460716&type=3
I'm trying to get the first number, between 'a.' and the next '.' I've been playing around with preg_match, but to no avail.. I'm not experienced at all with regex so perhaps I'm just using incorrect method or syntax. My attempts result in each character becoming an array key. If there's a simpler method than using regex then I'm all ears, was just pointed in this direction; all I'm needing is the number.
Any and All help is appreciated.
Without knowing the exact language you're using to solve this problem the following expression may do what you want:
"a[.](\d+)[.]"

Single regular expression that extracts a number from two different url formats?

I am trying to create a single regular expression that I can use to extract the number from two different urls in a PHP function. The format of these urls are:
/t/2121/title/
and
/top2121.html
I am bad at regular expressions and have already tried the following and many variants of it:
#^/t/(\d+?)/|/top(\d+?)\.html/#i
This is not doing anything and I am still at a complete loss after reading many sites and tutorials on regular expressions. Is there a regular expression I could create that would allow me to extra the number regardless of the url format entered?
Regex to extract only the digits while also checking if url matches accepted formats:
#^\/t(?:\/(\d+)\/[a-z_-]+\/?|op(\d+)\.html)$#i edit: captures in 2 groups
Explained demo here: http://regex101.com/r/dO5dI4
Variant #2: captures in the same group
#^\/t(?|\/(\d+)\/[a-z_-]+\/?$|op(\d+)\.html$)#i
Explained demo here: http://regex101.com/r/cG9vC3
if you just want the first digits after t regardless of the / between, something like this might work: #t/?(\d+)#i
edit:
example: http://codepad.viper-7.com/0z3ee0
I was able to get this regexp to match both types of url formats:
#^/(?:(?:t/)|(?:top))(\d+)(?:(?:\.html)|(?:/))#i
If anyone has a more efficient way of performing the same regexp, I would love to hear it.
If you got either one of these URL's you could use this expression. Your numbers should be stored in your second position:
#^/t(op|/)(\d+)(\.html|/.*)#i
Are there ever going to be numbers in the URL that you don't care about? If not, you can keep this simple by just capturing the numbers and ignoring the rest:
#(\d+)#

Best way to parse a text document

I'm trying to parse a plain text document in PHP but have no idea how to do it correctly.
I want to separate each word, assign them an ID and save the result in JSON format.
Sample text:
"Hello, how are you (today)"
This is what im doing at the moment:
$document_array = explode(' ', $document_text);
json_encode($document_array);
The resulting JSON is
[["Hello,"],["how"],["are"],["you"],["(today)"]]
How do I ensure that spaces are kept in-place and that symbols are not included along with the words...
[["Hello"],[", "],["how"],[" "],["are"],[" "],["you"],[" ("],["today"],[")"]]
I’m sure some sort of regex is required... but have no idea what kind of pattern to apply to deal with all cases... Any suggestions guys?
This is actually a really complex problem, and one that's subject to a fair amount of academic reaserch. It sounds so simple (just split on whitespace! with maybe a few rules for punctuation...) but you quickly run into issues. Is "didn't" one word or two? What about hyphenated words? Some might be one word, some might be two. What about multiple successive punctuation characters? Possessives versus quotes? etc etc. Even determining the end of a sentence is non-trivial. (It's just a full stop right?!)
This problem is one of tokenisation and a topic that search engines take very seriously. To be honest you should really look at finding a tokeniser in your language of choice.
Maybe this:?
array_filter(preg_split('/\b/', $document_text))
the 'array_filter', removes the empty values at the first and/or last index of the resulting array, which will appear if your string start or ends with a word boundary (\b see: http://php.net/manual/en/regexp.reference.escape.php)

Categories