Cancel changes
This project is realized within the context of the end-of-studies project of a research master in IoT and data processing at Tunisia Polytechnic School with SERCOM laboratory under the supervision of Mme Takoua ABDELLATIF and Mr Aymen YAHYAOUI.
The mind blowing Statistics show that in the last two years alone, 90% of data has been generated globally. and according to cisco, video streaming and downloading is expected to account for 82% of global internet traffic. - 15 times more than in 2017 Despite this huge amount of data, manual analysis can only distinguish 12% of the valuable data, which is not enough to meet current needs. All these figures require more efficient extraction of value from data, which places higher demands on real-time multimedia processing for big data systems.
In the IoT context, traditional video surveillance systems use multimedia processing engines to analyze incoming streams and send extracted events to interested parties. While these systems provide real-time multimedia processing, Scalability remains a nightmare characterized by the volume and velocity of this data because they cannot keep pace with the addition of new cameras at exponentially higher bandwidth and processing rates. The problem is that the data bandwidth can exceed the processing time, which can lead to data loss or delay in real-time processing. In addition, intensive data processing can impact on power consumption and even damage system hardware.
In this context, We propose MAFC, a multimedia adaptable flow controller for Big Data systems that ensures adaptability with system scalability without data loss and with performance optimization according to the application context.
As Shown here,For the data collection, the proposed architecture is able to ingest video data from either unbounded video streaming data or stored multimedia sequences (1).
Once the multimedia data has been acquired, it will decomposed into a sequence of individual multimedia objects that will be fed one by one to the flow controller (2).
Based on the calculated control policies by the policy adapter (8), the flow controller choose which frame will be published to the message broker on the concerned topic and which is not (3).
Then, the real-time data processing engine will extract the specific events from the in coming frames in real-time and publish them to the concerned topics (6) to consume them by the media stream controller in order to automatically update the control policies and by third-party services to notify concerned parties (7).
I am running this project on Ubuntu 20.04 , and i will cover installation for that.
Java and the JVM (Java’s virtual machine) are required for Apache Kafka and Flink
first update the package index:
sudo apt update
Next, check if Java is already installed:
java -version
If Java is not currently installed, you’ll see the following output:
Output
Command 'java' not found, but can be installed with:
sudo apt install openjdk-11-jre-headless # version 11.0.11+9-0ubuntu2~20.04, or
sudo apt install default-jre # version 2:1.11-72
sudo apt install openjdk-13-jre-headless # version 13.0.7+5-0ubuntu1~20.04
sudo apt install openjdk-16-jre-headless # version 16.0.1+9-1~20.04
sudo apt install openjdk-8-jre-headless # version 8u292-b10-0ubuntu1~20.04
Execute the following command to install the default Java Runtime Environment (JRE), which will install the JRE from OpenJDK 11:
sudo apt install default-jre
The JRE will allow you to run almost all Java software. Verify the installation with:
java -version
You’ll see output similar to the following:
Output
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing))
You may need the Java Development Kit (JDK) in addition to the JRE in order to compile and run some specific Java-based software. To install the JDK, execute the following command:
apt-get install openjdk-8-jdk
Getting Kafka up and running can be a bit tricky, so I’d recommend a Google search to match your setup.
Now before we can start Kafka itself, we will need to install that ZooKeeper we talked about earlier.
sudo apt-get install zookeeperd
ZooKeeper will kick of automatically as a daemon set to port 2181. Let’s make sure it’s running with:
netstat -ant | grep :2181
The output we want to see is :
tcp6 0 0 :::2181 :::* LISTEN
With that we’re ready for Kafka.
We can wget the download from the Apache site with
wget http://apache.claz.org/kafka/1.0.1/kafka_2.11-1.0.1.tgz
Make a directory for the extracted file.
sudo mkdir /opt/Kafka
Now extract the Kafka file to our newly minted directory.
sudo tar -xvf kafka_2.11-1.0.1.tgz -C /opt/Kafka/
Ok, let’s run Kafka!
cd /opt/Kafka/kafka_2.11-1.0.1/
sudo bin/kafka-server-start.sh config/server.properties
Test that everything is up and running, open a new terminal and type,
sudo bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic testing
You should see,
Created topic "testing".
To install IntelliJ IDE Community Edition through snap packages, open the terminal and type the command mentioned below:
sudo snap install intellij-idea-community --classic
To get our Kafka clients up and running, we’ll need the Kafka-Python project.
And, while we’re at it, we’ll also need OpenCV for video rendering, as well as Flask for our “distributed” Consumer.
pip install kafka-python opencv-contrib-python Flask
The Kafka Server we set up in the last section is bound to port 9092. We’ll use this value when setting up our two Kafka clients.
The first of our Kafka clients will be the message Producer. Here it will be responsible for converting video to a stream of JPEG images.
the Producer defaults by streaming video directly from the web cam — assuming you have one. If pulling from a video file is more your style (I recommend 15MB and smaller), the Producer accepts a file name as a command-line argument.
Video Srtream consumer To read our newly published stream, we’ll need a Consumer that accesses our Kafka topic. Since our message streamer was intended for a distributed system, we’ll keep our project in that spirit and launch our Consumer as a Flask service. Detected object consumer (from Flink) This consumer aims to subscribe to extracted events from Flink and published in a separate topic. As I mentioned before, Kafka gives a lot of the stream-access discretion to the Consumer which help us in implementing our MAFC system
If you’re interested, the Kafka-Python Documentation provides an in-depth look at everything that’s available.
we’ll need to start up Kafka, the two Consumers, and finally the Producer — each in their own terminal. As demonstrated previously, we start Kafka with a simple:
cd /opt/Kafka/kafka_2.11-1.0.1/
sudo bin/kafka-server-start.sh config/server.properties
In a new terminal, we’ll start up Consumer-video-stream file with:
cd /Kafka
python consumer-video-stream.py
If everything is working, your terminal should read
* Running on [http://0.0.0.0:5000/](http://0.0.0.0:5000/) (Press CTRL+C to quit)
In the browser, go to http://0.0.0.0:5000/video . You won’t see anything here yet, but keep it open because it’s about to come to life.
Also, you need to run the second consumer to get the extracted events from image using flink:
python consumer-flink.py
For the Producer, it’s more of the same. First, open a new terminal:
python producer.py
Here, we’ll be streaming from the web cam, so no additional arguments are needed. but the second consumer will not work as we didn't run the flink project yet.
Finally, we need to import the flink project with IntelliJ IDE and then, install dependencies and run StreamingJob.java under Flink-part/src/main/java/jar/ .
If you’re interested, in-depth look references are linked below.
references |
---|
Flink |
Kafka |
YOLO |
Kafka-Python Documentation |