Our architecture follows our simple three steps of Extract Track Comply:
On the Extract part of the design, we are using a powerful open source flow management infrastructure (Pontus-NiFi) based on the Apache NiFi project; that enables users to convert data from a variety of platforms ready for the Track phase.
On the Track part of the design, we store data into a canonical format, and can run either Online Transaction Processing (OLTP), or Online Analytics Processing (OLAP) queries on the data to clean up the application. We use a gremlin Tinkerpop 3.3.0 compliant graph database do front those queries, and store the data into Apache Hbase 1.3.1 and index it with Elastic Search 5.6.3. We can also apply very rich redaction/filtering rules inside these stores to ensure that not even an administrator can see sensitive data. All the data is encrypted both in-flight (TLS) and at-rest (using dmcrypt), with keys optionally stored in a Hardware Security Module (HSM).
Lastly, the Comply part of the architecture is what gives users the ability to query the data. We ensure that all users are authenticated by using a combination of either Apache Knox or Nginx as HTTPs Gateways, with KeyCloak to authenticate users and generate a JSON Web Token (JWT) that can then be used to track user queries throughout the system. KeyCloak can authenticate users from a variety of external (OpenID, SAML, OAUTH2) as well as internal sources (e.g. Active Directory). The user queries can be easily modified to cater for the user needs without any new code being created.