Semantics At The Heart Of The “Big Data” Matter

Fundamental to automating decision making in many application domains is the ability to analyze Big Data. However, maximizing the value of analyzing big data is the need for analytics techniques to be able to “see” beyond” just syntactic representations of data. Rather, the meaning of data or data semantics must be represented in a machine-processible manner so that the exact nature of relationships in data can be exploited effectively. In particular, many applications require the assembly of different kinds of heterogeneous data to be analyzed to solve important problems. This is impossible without the ability to automatically reason about the way in which the different data are related. Data semantics is also being considered as a critical component of enabling “interpretability” of the outcomes of machine and deep learning techniques which currently are mostly semantics-oblivious.

Enabling Web Attack Reconstruction

Many modern network security incidents originate from the Web. For instance, it is not uncommon for users to stumble upon a website that hosts malicious advertisements, which in turn may redirect to phishing sites or promote the installation of malicious software via social engineering attacks. In corporate networks, such attacks can have devastating consequences. For example, an initial web-driven malware infection may be used as a stepping stone for larger scale network intrusions and costly data breaches. When such high-profile incidents are discovered, often weeks or even months after the initial attack took place, a digital forensics team is typically called in to reconstruct the root causes of the incident, so that better network defenses and security policies can be developed. However, a forensic analyst may not be able to reconstruct the entire chain of events up to the initial web attack that is the true root cause of the network breach. This is because modern browsers lack the ability to produce detailed audit logs, and the information contained in the existing navigation history and browser cache is typically too sparse or short lived to allow for a detailed reconstruction of complex web attacks.


The Web has redefined the way in which malicious activities such as online harassment, disinformation, and radicalization are carried out. To be able to fully understand these phenomena, we need computational tools able to trace malicious activity as it happens and identify influential entities that carry it out. In this talk, I will present our efforts in developing tools to automatically monitor and model malicious online activities such as coordinated aggression and disinformation. I will then discuss possible mitigations against these harmful activities, keeping in mind the potential unintended consequences that might arise from suspending offending users.