Data Streaming Roadmap
We have come to an end of our current roadmap, and I would like to share with you our experience and journey.
Every quarter we have a look at the industry and we try to adjust our roadmap accordingly so that we are always up to date. In the previous roadmap we learnt all the ins and outs of a datalake. To finish up the roadmap for the data profile (another post for what our profiles are), we finished with a roadmap on streaming.
We started the roadmap by going over the different technologies that we felt had to do with data streaming. The technologies are very wide range from databases to frameworks, streaming architectures and distributed computing.
10.12.20 — Introduction to Streaming
10.19.20 — Advanced Topics in Streaming
10.26.20 —Apache Beam Workshop
11.16.20 — Streaming Architectures
11.23.20 — Apache Pulsar Workshop
11.30.20 — Building a Mini-Debezium CDC
12.21.20 — Streaming Analytics with Druid and Kafka
12.28.20 — Group Fuzeday summary
Obviously we cannot cover all of the topics. The first two sessions had to do with what streaming is. We covered the difference between process time and event time, and of course all the building blocks. For a great book on the topic see Streaming Systems. We then went on to advance topics of streaming in production, with issues like schema evolution, load balancing and much more. For a final on the first step, we worked on a workshop using Apache Beam on GCP dataflow.
Then via different lectures on different areas of data streaming we tried to cover the basic blocks of injection, processioning, windowing and storage.
In our full fuse day, we split up to different teams to try out streaming on different platforms and architectures. Each team had a full day to try them out and present the results to the full team. The platforms and architecture that we tires varied from server less streaming on AWS kinesis, evaluation data bricks Delta Lake
We will contact you as soon as possible.