Open Source love-hate relationship

I have been working for a long time with different open source projects. My experience has been very different depending on the open source and the level of maturity.

Currently, I have been working with two major frameworks: airflow, apache-beam.

In this blog, I would like to address my experience with apache-beam, which has been fairly good.

Apache-Beam (https://beam.apache.org/) is a unified programming model for big data (competing with Spark, Flink, and others). This blog will not go over the features of Beam but will address my experience with it from the open source perspective.

When you purchase a framework, you have the feeling that you have a company backing you up, and any problems you may have can be solved by opening a ticket. In my past experience, this is not true. Yes if your issue is a simple one then the first tier can help you out. What if the feature does not work as you expected, and there is something missing - you have to hope they will add it to their roadmap, and then wait for the release (usually months).

I have been using Beam to migrate data from MongoDB to the Google platform. I have previous experience working with MongoDB, so I know some of the features that exist there. I was a bit surprised to find out that apache-beam does not support SSL connection to the server. In addition, the feature of projection which would allow me to bring only part of the document was greatly missed.

So what can I do about it? First, any time you use an open source you should register to the email workgroup (in the case of beam head over to google group). In the email group, you can monitor how stable the platform is by the questions being asked. In addition, it is a source of knowledge on the features even if you do not plan on contributing code. Finally, you can contribute code.

So after reading up on the source code in beam-git), I understood that it is not that hard to add the missing code for the features I needed. So to begin with, I copied the files that I needed to change into my project (under the same package name) so that I can change the code. Once I had working features (yes with testing), I could then think of contributing the code back to the open source project. Depending on the project there is usually a readme for how to contribute code (contribute), I have to admit that the process of contributing in itself has taught me a lot (you can see my commits at projection, SSL).

I have to admit that it is not always so rosy. I saw there was a new version out, so I upgraded my system (in the dev environment, not production), and found out that the filters for the data were not working properly. This was very concerning since I did not expect this to happen. The reason was that someone else committed to the project a new feature and it broke the code. This could also happen in an enterprise platform, and you would expect a very fast patch to be put out. So again I took the code to my project, checked what broke and in the same day fixed the code and submitted to patch to the open source (custom filters). Once the issue that should be looked into is how often does the open source project release new versions. You would not like to keep your personal changes for that long so that you don’t forget to delete it when the new version comes out. In my options once a month is great, more than that and it is hard to keep track.

To summarize, I think that my experience with open source has been a good one. The fact that you can quickly fix issues as they arise is very powerful and fulfilling to give back to the community. So I invite you all to start using open source as much as possible and to contribute back to the community as much as possible.

Backend/Data Architect

Backend Group
Thank you for your interest!

We will contact you as soon as possible.

Send us a message

Oops, something went wrong
Please try again or contact us by email at info@tikalk.com