I’ll jump-start this blog with the latest discovery I had recently experimenting with Python, Maven, and Robot Framework.
The requirement is to check a relatively big CSV file (around 200K lines with a dozen columns) for data continuity i.e. if the period covered by the data in the file contains data for every day in the given period.
As I have not had any real experience with Python (I tried it for some simple tasks, but nothing serious), this was a surprise to me: pandas. It is a really powerful data analysis tool made in Python, with some libraries written in native C - and that’s why it isn’t compatible with Jython (Java’s implementation of Python).
For now (at least), as I don’t utilise the full posibilities of pandas, but the small part of it that I used.. I’m impressed: A file of 200k-400k lines in CSV, with couple of dozens of columns is parsed, analysed and transformed in around 4-5 seconds! Compare that with a simple algorithm in Jython/Robot Framework - 1 hour and 30 min! Okey, maybe my alrogithm isn’t optimised, but as I have discovered this, I don’t need to optimise anything - only to switch the technology. Thank you Google!