spot7.org logo
Home PHP C# C++ Android Java Javascript Python IOS SQL HTML Categories

Replace adjacent identical tokens that match a regex


Sorry, I was working on the answer before seeing you first comment. If this doesn't answer your question, let me know, and I'll remove it or will try to modify it accordingly.

For the simple input provided in the question (what in the code below is stored in the my_string variable), you could maybe try a different approach: Walk your input list and keep a "bucket" of <matching_word, num_of_occurrences>:

my_string="xyz abc abc zzq ak9 ak9 ak9
foo abc"
my_splitted_string=my_string.split(' ')
occurrences = []
print ("my_splitted_string is a %s now containing:
%s"
       % (type(my_splitted_string),
my_splitted_string))

current_bucket = [my_splitted_string[0], 1]
occurrences.append(current_bucket)
for i in range(1, len(my_splitted_string)):
    current_word = my_splitted_string[i]
    print "Does %s match %s?" % (current_word,
current_bucket[0])
    if current_word == current_bucket[0]:
        current_bucket[1] += 1
        print "It does. Aggregating"
    else:
        current_bucket = [current_word, 1]
        occurrences.append(current_bucket)
        print "It doesn't. Creating a new
'bucket'"

print "Collected occurrences: %s" % occurrences
# Now re-collect:
re_collected_str=""
for occurrence in occurrences:
    if occurrence[1] > 1:
        re_collected_str += "%s*%d " %
(occurrence[0], occurrence[1])
    else:
        re_collected_str += "%s " %
(occurrence[0])
print "Compressed string: '%s'"

This outputs:

my_splitted_string is a <type
'list'> now containing: ['xyz', 'abc', 'abc',
'zzq', 'ak9', 'ak9', 'ak9', 'foo', 'abc']
Does abc match xyz?
It doesn't. Creating a new 'bucket'
Does abc match abc?
It does. Aggregating
Does zzq match abc?
It doesn't. Creating a new 'bucket'
Does ak9 match zzq?
It doesn't. Creating a new 'bucket'
Does ak9 match ak9?
It does. Aggregating
Does ak9 match ak9?
It does. Aggregating
Does foo match ak9?
It doesn't. Creating a new 'bucket'
Does abc match foo?
It doesn't. Creating a new 'bucket'
Collected occurrences: [['xyz', 1], ['abc', 2],
['zzq', 1], ['ak9', 3], ['foo', 1], ['abc', 1]]
Compressed string: 'xyz abc*2 zzq ak9*3 foo abc '

(beware of the final blank space)


Categories : Python

Related to : Replace adjacent identical tokens that match a regex
php regex match url contains and replace slash with dash
Here's one way to accomplish this, but it uses Regexes in conjunction to PHP functions. This method will be simpler than a pure Regex solution. $string = '<a href="http://www.notchange.com/adf/i18n/wiki/" class="coasfs" >as3rc</a>' . '<a href="http://www.change.com/q/photoshopbattles/comnts/2n4jtb/psbattle_asgfdhj/" class="coasfs" >as3rc</a>' . '<a href="http://

Categories : PHP
RegEx to match and replace grammar words (v, n, adv, adj)
search : (?<=W)(v|n|adv|adj)(?=W) replace with : <i>1</i> this seems to work : http://regex101.com/r/tD9lC0/1

Categories : Regex
Is it possible to use $1 in a regex match?
PHP Regex uses 1 instead of $1. For more information, refer to the PHP Manual on regex back references.

Categories : PHP
Using sed to replace IP using regex
Two problems here: sed doesn't like PCRE digit property d, use range: [0-9] or POSIX [[:digit:]] You need to use -r flag for extended regex as well. This should work: s='123.123.123.123' sed -r 's/([0-9]{1,3}.){3}[0-9]{1,3}/222.222.222.222/' <<< "$s" 222.222.222.222 Better would be to use anchors to avoid matching unexpected input: sed -r 's/^([0-9]{1,3}.){3}[0-9]{1,3}$/222.222.22

Categories : Regex
How to get all possible interpretations in regex match?
Given a phrase like 'a in b in c in d' this will generate all possible partitions by the word in: words = phrase.split() for n, w in enumerate(words): if w == 'in': print '(%s) in (%s) ' % ( ' '.join(words[:n]), ' '.join(words[n+1:])) For your specific problem, if there are three ins in the phrase, the "middle" interpretation ((a in b) in (c in d)) would be

Categories : Python
Recently Add
Adding json to new Django database
document clustering in python
Only read the last character in a .txt file
Is it possible to redirect to different domain retaining the trailing endpoint, from a route?
Python NameError: not defined
Numpy array loop
Turning off Tick Marks in Bokeh
Python: Number and operator concatenation not working (Euler's Method)
Python: Twitter API tweets/search: Flatten nested dictionary to columns
Python 2.7 cmd autocomplete readline buffers seem stale
Memory usage/efficiency for pandas dataframe versus lists versus tuples, etc.
How can i use multiple lists as arguments in a function and receive them differently?
PyQt - setText method of QTableWidget gets AttributeError
Matplotlib - get value of autoscale
Backwards axes in numpy.delete
Twilio - How to determine the length of a conference call?
Counting string using for loop
Automating creation of class instances in python for an undetermined amount of instances
Printing 2D-array in a grid
Load PreComputed Vectors Gensim
IPython _repr_html_
Reversing a number using recursion
Is there a configuration under which a numpy operation will work on more than a single core/thread?
Is there a better way to write this if-statement?
Python: Effective reading from a file using csv module
Django/Python: CSV for-in loop overriding first row each time through
How to read the pickled igraph graph object from old version by new version igraph
Tornado WebSocket with Django ORM with shared session
trying to plot contours of bivariate normal, won't work with a correlation term
Python split users input
© Copyright 2017 spot7.org Publishing Limited. All rights reserved.