Thursday, March 20, 2025

Compliance!

Compliance

Compliance means a company, and its employees, are following the rules so it doesn't get punished by regulators (e.g. fines), courts (e.g. adverse legal judgments), or the market (stock price drop), or all of them. Rules means following financial and privacy law, but also obeying contract rules. On the face of it, this all sounds like something only the finance and legal departments need to worry about, but increasingly data people (analysts, data scientists, data engineers) need to follow compliance rules too. In this blog post, I'll explain why compliance applies to you (data people) and what you can do about it.

(Get compliance wrong, and someone like this may be in your future.  InfoGibraltarCC BY 2.0, via Wikimedia Commons)

I'm not a lawyer, so don't take legal advice from me. What you should do is read this blog post, think about gaps in your compliance processes, and talk to your legal team.

Private data on people

By now, most data people understand that data that identifies individuals is covered by privacy laws and needs to be handled carefully. Data people also understand that there can be large fines for breaches or mishandling data. Unfortunately, this understanding often isn't enough and privacy laws are more complex and broader than many technical staff realize.

(Private Property sign by Oast House Archive, CC BY-SA 2.0 <https://creativecommons.org/licenses/by-sa/2.0>, via Wikimedia Commons)

Several data privacy laws have an extraterritorial provision which means the law applies anywhere in the world (most notably, the GDPR). For example, a Mexican company processing data on French residents is covered by the GDPR even though the data processing takes place in Mexico. For a company operating internationally, this means obeying several sets of laws, which means in practice the strictest rules are used for everyone.

What is personally identifiable information (PII) sometimes isn't clear and can change suddenly. Most famously, the Court of Justice of the European Union (CJEU) ruled in the Breyer case that IP addresses can be PII under some circumstances. I'm not going to dive into the ruling here (you can look it up), but the court's logic is clear. What this ruling illustrates is that "common sense" views of what is and is not PII aren't good enough.  

The GDPR defines a subset of data on people as "special categories of personal data" which are subject to more stringent regulation (this guide has more details). This includes data on sexuality, religion, political activities etc. Once again, this seems obvious in theory, but in practice is much harder. For example, the name of someone's partner can reveal their sexuality and is therefore sensitive data.

There are two types of private data on people companies handle that are often overlooked. Employee data is clearly private, but is usually closely held for obvious reasons. Customer data in CRM systems is also private data on people but tends to be less protected. Most CRM systems have prospect and contact names, job titles, phone numbers etc. and I've even heard of systems that list customers' hobbies and interests. Data protection rules apply to these systems too.

I've only just scratched the surface of the rules surrounding processing data on people but hopefully I've made clear that things aren't as straightforward as they appear. A company can break the law and be fined if its staff (e.g. data analysts, data scientists, data engineers etc.) handle data in a way contrary to the law.

Trading based on confidential information

Many companies provide services to other companies, e.g. HR, payroll, internet, etc. This gives service providers' employees access to confidential information on their customers. If you're a service provider, should you let your employees make securities transactions based on confidential customer information?

(Harshitha BN, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons)

A hypothetical case can make the risks clearer. Let's imagine a payroll company provides services to other companies, including several large companies. A data analyst at the payroll company spots ahead of time that one of their customers is laying off a large number of its employees. The data analyst trades securities in that company based on this confidential information. Later on, the fact that the data analyst made those trades becomes publicly known.

There are several possible consequences here.

  • Depending on the jurisdiction, this may count as "insider trading" and be illegal. It could lead to arrests and consequential bad publicity and reputational damage.
  • This could be a breach of contract and could lead to the service provider losing a customer.
  • At the very least, there will be commercial repercussions because the service provider has violated customer trust.

Imagine you're a company providing services to other companies. Regardless of the law, do you think it's a good idea for your employees to be buying or selling securities based on their confidential customer knowledge?

Legal contracts

This is a trickier area and gets companies into trouble. It's easiest if I give you a hypothetical case and point out the problems.

(Staselnik, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons)

A company, ServiceCo, sells services into the mining industry in different countries. As part of its services, it sells a "MiningNetwork" product that lists mining companies and the names of people in various jobs in them (e.g. safety officers, geologists and so on). It also produces regular reports on the mining industry that it makes available for free on its website as part of its marketing efforts, this is called the "Mining Today Report". 

For sales prospecting purposes, the sales team buys data from a global information vendor called GlobalData. The data ServiceCo buys lists all the mines owned by different companies (including joint ventures etc.) and has information on those mines (locations, what's being mined, workforce size etc). It also lists key employees at each of those mines. This data is very expensive, in part because it costs GlobalData a great deal of money to collect. The ServiceCo sales team incorporates the GlobaData data into their CRM and successfully goes prospecting. Although the data is expensive, the sales team are extracting value from it and it's worth it to them.

Some time later, a ServiceCo data analyst finds this data in an internal database and they realize it could be useful elsewhere. In conjunction with product management, they execute a plan to use it:

  • They augment the "MiningNetwork" product with GlobalData data. Some of this data ServiceCo already had, but the GlobalData adds new mine sites and new people and is a very significant addition. The data added is directly from the GlobalData data without further processing.
  • They augment their free "Mining Today Report" with the GlobalData data. In this case, it's a very substantial upgrade, increasing the scope of the report by 50% or more. In some cases, the additions to the report are based on conclusions drawn from the GlobalData data, in other cases it's a direct lift (e.g. mine locations). 

Just prior to release, the analyst and the product manager report this work to the ServiceCo CTO and CEO in an internal pre-release demo call. The analyst is really happy to point out that this is a substantial new use for data that the company is paying a great deal of money for.

You are the CEO of ServiceCo. What do you do next and why?

Here's my answer. You ask the data analyst and the product manager if they've done a contract review with your legal team to check that this use of GlobalData's data is within the terms of the contract. You ask for the name of the lawyer they've worked with and you speak to the lawyer before the release goes out. If the answer isn't satisfactory, you stop the projects immediately regardless of any pre-announcements that have been made. 

Why?

These two projects could put the company in substantial legal jeopardy. When you buy data, it usually comes with an agreement specifying allowed uses. Anything else is forbidden. In this case, the data was bought for sales prospecting purposes from a large and experienced data supplier (GlobalData). It's very likely that usage of this data will be restricted to sales prospecting and for internal use only. Bear in mind, GlobalData may well be selling the same sort of data to mining companies and other companies selling to mining companies. So there are likely two problems here:

  1. The GlobalData data will used for purposes beyond the original license agreement.
  2. The GlobalData data will be distributed to others companies free of charge (in the case of "Mining Today Report"), or for charge ("MiningNetwork") with no royalty payments to GlobalData. In effect, ServiceCo will go from being a user of GlobalData data to distributing GlobalData's data without paying them. ServiceCo will be doing this without an explicit agreement from GlobalData. This may well substantially damage GlobalData's business.

The second point is the most serious and could result in a lawsuit with substantial penalties.

The bottom line is simple. When you buy data, it comes with restrictions on how you use it. It's up to you to stick to the rules. If you don't, you may well get sued.

(I haven't mentioned "open source" data so far. Many freely available data sets have licensing provisions that forbid commercial use of the data. If that's the case, you can't use it for commercial purposes. Again, the onus is on you to check and comply.)

What can you do about it?

Fortunately, there are things you can do to manage the risk. Most of the actions revolve around having a repeatable process and/or controls. The nice thing about process and controls is, if something does go wrong, you can often reduce the impact, for example, if you breach the GDPR, you can show you treated it seriously and argue for a lesser fine. 

Let's look at some of the actions you should consider to manage data compliance risk.

Education

Everyone who handles data needs to go through training. This should include:

  • Privacy and PII training.
  • Trading on confidential information.
  • Rules around handling bought in data.
Initially, everyone needs to be trained, but that training needs to be refreshed every year or so. Of course, new employees must be trained too.

Restricted access/queries

Who has access to data needs to be regulated and controlled. For example, who needs to have access to CRM data? Plainly, the sales and marketing teams and the engineering people supporting the product, but who else? Who should not have access to the data? The first step here is to audit access, the second step is to control access, the third step is to set up a continuous monitoring process.

A piece that's often missed is controlling the nature of queries run on the data. The GDPR limits querying on PII data to areas of legitimate business interest. An analyst may well run "initiative" queries to see if the company could extract more value from the data, and that could be problematic. The solution here is education and supervision.

Encryption

There's an old cybersecurity mantra "encrypt data at reset, encrypt data in transit". Your data needs to be encrypted by an appropriately secure algorithm and not one susceptible to rainbow table or other attacks.

Related to encryption is the idea of pseudonymization. To put it simply, this replaces key PII with a string, e.g. "John Smith" might be replaced with "Qe234-6jDfG-j56da-9M02sd", similarly, we might replace his passport number with a string, his credit card number with a string, his IP address with a string, his account number, and so on. The mapping of this PII data to strings is via a database table with very, very restricted access.

As it turns out, almost all analysis you might want to do on PII data works equally well with pseudonymization. For example, let's say you're a consumer company and you want to know how many customers you have in a city. You don't actually need to know who they are, you just need counts. You can count unique strings just the same as you can count unique names. 

There's a lot more to say about this technique, but all I'm going to say now is that you should be using it.

Audit

This is the same as any audit, you go through the organization with a set of questions and checks. An audit is a good idea as an initial activity, but tends to be disruptive. After the initial audit, I favor annual spot checks. 

Standards compliance

There are a ton of standards out there covering data compliance: SOC2, NIST, ISO27000, FedRamp, etc. It's highly likely that an organization will have to comply with one or more of them. You could try and deal with many/most compliance issues by conforming to a standard, but be aware that will still leave gaps. The problem with complying to a standard is that the certification becomes the goal rather than reducing risk. Standards are not enough.

Help line

A lot of these issues are hard for technical people to understand. They need ongoing support and guidance. A good idea is to ensure they know who to turn to to get help. This process needs to be quick and easy. 

(Something to watch out for is management retaliation. Let's say a senior analyst thinks a use of data breaches legal terms but their manager tells them to do nothing. The analyst reaches out to the legal team who confirms that the intended use is a breach. The manager cannot be allowed to retaliate against the analyst.)

The bottom line

As a technical person, you need to treat this stuff seriously. Assuming "common sense" can get you into a lot of trouble. Make friends with your legal team, they're there to help you.

No comments:

Post a Comment