How Iterate or remove MongoDb array list item using pymongo?

1.4k views Asked by At

I want to iterate Mongodb database Arraylist items(TRANSACTION list) and remove Arraylist specific(TRANSACTION List) item using pymongo ?

enter image description here

I create Mongo collection as above using python pymongo. I want to iterate array list item using pymongo and remove final item only in Arraylist?

Data insert query using Python pymongo

 # added new method  create block chain_structure
    def addCoinWiseTransaction(self, senz, coin, format_date):
        self.collection = self.db.block_chain
        coinValexists = self.collection.find({"_id": str(coin)}).count()
        print('coin exists : ', coinValexists)
        if (coinValexists > 0):
            print('coin hash exists')
            newTransaction = {"$push": {"TRANSACTION": {"SENDER": senz.attributes["#SENDER"],
                                                        "RECIVER": senz.attributes["#RECIVER"],
                                                        "T_NO_COIN": int(1),
                                                        "DATE": datetime.datetime.utcnow()
                                                        }}}
            self.collection.update({"_id": str(coin)}, newTransaction)
        else:
            flag = senz.attributes["#f"];
            print flag
            if (flag == "ccb"):
                print('new coin mined othir minner')
                root = {"_id": str(coin)
                    , "S_ID": int(senz.attributes["#S_ID"]), "S_PARA": senz.attributes["#S_PARA"],
                        "FORMAT_DATE": format_date,
                        "NO_COIN": int(1),
                        "TRANSACTION": [{"MINER": senz.attributes["#M_S_ID"],
                                         "RECIVER": senz.attributes["#RECIVER"],
                                         "T_NO_COIN": int(1),
                                         "DATE": datetime.datetime.utcnow()
                                         }
                                        ]
                        }
                self.collection.insert(root)
            else:
                print('new coin mined')
                root = {"_id": str(coin)
                    , "S_ID": int(senz.attributes["#S_ID"]), "S_PARA": senz.attributes["#S_PARA"],
                        "FORMAT_DATE": format_date,
                        "NO_COIN": int(1),
                        "TRANSACTION": [{"MINER": "M_1",
                                         "RECIVER": senz.sender,
                                         "T_NO_COIN": int(1),
                                         "DATE": datetime.datetime.utcnow()
                                         }
                                        ]
                        }
                self.collection.insert(root)

        return 'DONE'
1

There are 1 answers

7
chridam On BEST ANSWER

To remove the last entry, the general idea (as you have mentioned) is to iterate the array and grab the index of the last element as denoted by its DATE field, then update the collection by removing it using $pull. So the crucial piece of data you need for this to work is the DATE value and the document's _id.

One approach you could take is to first use the aggregation framework to get this data. With this, you can run a pipeline where the first step if filtering the documents in the collection by using the $match operator which uses standard MongoDB queries.

The next stage after filtering the documents is to flatten the TRANSACTION array i.e. denormalise the documents in the list so that you can filter the final item i.e. get the last document by the DATE field. This is made possible with the $unwind operator, which for each input document, outputs n documents where n is the number of array elements and can be zero for an empty array.

After deconstructing the array, in order to get the last document, use the $group operator where you can regroup the flattened documents and in the process use the group accumulator operators to obtain the last TRANSACTION date by using the $max operator applied to its embedded DATE field.

So in essence, run the following pipeline and use the results to update the collection. For example, you can run the following pipeline:

mongo shell

db.block_chain.aggregate([
    { "$match": { "_id": coin_id } },
    { "$unwind": "$TRANSACTION" },
    { 
        "$group": {
            "_id": "$_id",
            "last_transaction_date": { "$max": "$TRANSACTION.DATE" }
        }
    }
])

You can then get the document with the update data from this aggregate operation using the toArray() method or the aggregate cursor and update your collection:

var docs = db.block_chain.aggregate([
    { "$match": { "_id": coin_id } },
    { "$unwind": "$TRANSACTION" },
    { 
        "$group": {
            "_id": "$_id",
            "LAST_TRANSACTION_DATE": { "$max": "$TRANSACTION.DATE" }
        }
    }
]).toArray()

db.block_chain.updateOne(
    { "_id": docs[0]._id },
    { 
        "$pull": { 
            "TRANSACTION": { 
                "DATE": docs[0]["LAST_TRANSACTION_DATE"] 
            } 
        } 
    }
)

python

def remove_last_transaction(self, coin):
    self.collection = self.db.block_chain

    pipe = [
        { "$match": { "_id": str(coin) } },
        { "$unwind": "$TRANSACTION" },
        { 
            "$group": {
                "_id": "$_id",
                "last_transaction_date": { "$max": "$TRANSACTION.DATE" }
            }
        }
    ]

    # run aggregate pipeline
    cursor = self.collection.aggregate(pipeline=pipe)
    docs = list(cursor)

    # run update
    self.collection.update_one(
        { "_id": docs[0]["_id"] },
        { 
            "$pull": { 
                "TRANSACTION": { 
                    "DATE": docs[0]["LAST_TRANSACTION_DATE"] 
                } 
            } 
        }
    )

Alternatively, you can run a single aggregate operation that will also update your collection using the $out pipeline which writes the results of the pipeline to the same collection:

If the collection specified by the $out operation already exists, then upon completion of the aggregation, the $out stage atomically replaces the existing collection with the new results collection. The $out operation does not change any indexes that existed on the previous collection. If the aggregation fails, the $out operation makes no changes to the pre-existing collection.

For example, you could run this pipeline:

mongo shell

db.block_chain.aggregate([
    { "$match": { "_id": coin_id } },
    { "$unwind": "$TRANSACTION" },
    { "$sort": { "TRANSACTION.DATE": 1 } }
    { 
        "$group": {
            "_id": "$_id",
            "LAST_TRANSACTION": { "$last": "$TRANSACTION" },
            "FORMAT_DATE": { "$first": "$FORMAT_DATE" },
            "NO_COIN": { "$first": "$NO_COIN" },
            "S_ID": { "$first": "$S_ID" },
            "S_PARA": { "$first": "$S_PARA" },
            "TRANSACTION": { "$push": "$TRANSACTION" }
        }
    },
    {
        "$project": {
            "FORMAT_DATE": 1,
            "NO_COIN": 1,
            "S_ID": 1,
            "S_PARA": 1,
            "TRANSACTION": { 
                "$setDifference": ["$TRANSACTION", ["$LAST_TRANSACTION"]]
            }
        }
    },
    { "$out": "block_chain" }
])

python

def remove_last_transaction(self, coin):    
    self.db.block_chain.aggregate([
        { "$match": { "_id": str(coin) } },
        { "$unwind": "$TRANSACTION" },
        { "$sort": { "TRANSACTION.DATE": 1 } },
        { 
            "$group": {
                "_id": "$_id",
                "LAST_TRANSACTION": { "$last": "$TRANSACTION" },
                "FORMAT_DATE": { "$first": "$FORMAT_DATE" },
                "NO_COIN": { "$first": "$NO_COIN" },
                "S_ID": { "$first": "$S_ID" },
                "S_PARA": { "$first": "$S_PARA" },
                "TRANSACTION": { "$push": "$TRANSACTION" }
            }
        },
        {
            "$project": {
                "FORMAT_DATE": 1,
                "NO_COIN": 1,
                "S_ID": 1,
                "S_PARA": 1,
                "TRANSACTION": { 
                    "$setDifference": ["$TRANSACTION", ["$LAST_TRANSACTION"]]
                }
            }
        },
        { "$out": "block_chain" }
    ])

Whilst this approach can be more efficient than the first, it requires knowledge of the existing fields first so in some cases the solution cannot be practical.