Native Python queues do not allow you to stop and resume an application without loosing queue items. Adding persistence of objects to a derived Python Queue class partially addresses this issue. We use the DyBASE embedded object oriented database to persist queues items in files. If needed, you can instantiate multiple queues pointing to different files. Since our PersistentQueue class is derived from the Python Queue, it works in multithreading environments. Transactional support such as acknowledging successfully processed queue items is not currently a feature of this class.
In Using Queues in Web Crawling and Analysis Infrastructure we noted the relevancy of queues to connect heterogeneous technologies. Queues are also used in the context of a single technology to follow the typical producer/consumer pattern. For example, the Python programming language offers FIFO and priority queues, as does .NET. However, neither of these native queues persists. The Microsoft Windows Azure platform incorporates persistant queues but has other limitations, and also may be overkill for your solution.
There are several ways to persist a queue. If the items that you want to persist have a fixed buffer length then Berkeley DB’s queues or STXXL’s queues work well. You can’t use database managers like GDBM if you need a FIFO queue since you need to traverse the elements in order and the hash table does not assure this order. STXXL, and DyBASE use a B+Tree data structure. You may be tempted to use a database engine like SQLite which can be useful in many scenarios, but an SQL engine adds complexity that is not required for FIFO queues.
Prerequisites
- DyBASE: http://www.garret.ru/dybase.html
Code
The code is also available at github.
#!/usr/bin/python
from Queue import Queue
import dybase
import sys
MAX_INT = sys.maxint
MIN_INT = -MAX_INT - 1
#DEBUG = True
DEBUG = False
class Root(dybase.Persistent):
def __init__(self):
self.start = 0
self.stop = 0
class SizeOfPersistentQueueExceeded(Exception):
pass
class incomplete_persistent_deque:
def __init__(self, filename):
self._init_db(filename)
def _init_db(self, filename):
self.db = dybase.Storage()
if self.db.open(filename):
self.root = self.db.getRootObject()
if self.root == None:
self.root = Root()
self.root.elements = self.db.createIntIndex() # createLongIndex can be used on 64 bits systems but it is strange to pass 2**32 elements in the queue
self.root.pending_elements = self.db.createIntIndex()
self.db.setRootObject(self.root)
self.db.commit()
else:
if DEBUG:
print "self.root already exists"
if DEBUG:
print "self.root.start =", self.root.start
print "self.root.stop = ", self.root.stop
def __len__(self):
if self.root.stop >= self.root.start:
return self.root.stop - self.root.start
else:
return (MAX_INT - self.root.start + 1) + (self.root.stop - MIN_INT)
def append(self, item):
# add element to index
self.root.elements.insert(self.root.stop, item)
self.root.stop += 1
if self.root.stop > MAX_INT:
# check also if stop touches start
self.root.stop = MIN_INT
if self.root.start == self.root.stop:
raise SizeOfPersistentQueueExceeded
# persist
self.root.store()
self.db.commit()
def popleft(self):
# don't check if empty, Queue class take charge of that
# remove element from index
item = self.root.elements.get(self.root.start)
self.root.elements.remove(self.root.start)
self.root.start += 1
if self.root.start > MAX_INT:
# check also if start touches stop
self.root.start = MIN_INT
if self.root.start == self.root.stop: # if queue is empty resync start & stop to 0. It is for beautifier purposes can be removed.
self.root.start = 0
self.root.stop = 0
# persist
self.root.store()
self.db.commit()
return item
class PersistentQueue(Queue):
def __init__(self, filename, maxsize = 0):
self.filename = filename
Queue.__init__(self, maxsize)
def _init(self, maxsize):
# original: self.queue = deque()
# incomplete_persistent_deque:
# - incomplete implementation but enough for Queue:
# - implemented methods:
# -- __len__
# -- append
# -- popleft
#
self.queue = incomplete_persistent_deque(self.filename)
def connect(self): # to handle failovers
pass
def ack(self):
pass
#def ack(self, item):
class ElementTest:
def __init__(self, value):
self.value = value
def __repr__(self):
return self.value
def __str__(self):
return self.value
def test1():
q = PersistentQueue("myqueue.dbs")
if not q.empty(): # get pending items
while not q.empty():
e = q.get()
print e
for s in ['one', 'two', 'three']:
q.put(ElementTest(s))
def main(): # run this script twice to see the persisted elements
test1()
if __name__ == '__main__':
main()
See Also
- Esoteric Queue Scheduling Disciplines
- Using Queues in Web Crawling and Analysis Infrastructure
- Adding Acknowledgement Semantics to a Persistent Queue
Resources
- [queue persistent site:stackoverflow.com] Google query
- bsddb3 Python interface for Berkeley DB
- bsddb3 examples
- STXXL queue class template