Links

Sensitive data flow discovery

Bearer detects and classifies sensitive data flows by scanning your source code using custom-built static code analyzers. You can learn more about how we built Bearer detection capabilities in this blog article.
Sensitive data flow discovery is performed by detecting two primary elements: sensitive data stored and processed, and components associated with the data (data stores, internal APIs, external APIs).
Bearer never has access to your source code. All the static code analysis is performed one-premise through our broker image.

How does sensitive data detection work?

Bearer detects sensitive data through two main methods:
  • Understanding your code base by analyzing class names, methods/functions, variables, properties or attributes and tying all of those together to detect data structures.
  • Analyzing structured data definitions as Open API, SQL, GraphQL or Protobuf files.

How does sensitive data classification work?

Sensitive data detected by our static code analyzers is classified using custom-built classifiers. The first step is to ensure we are in the presence of an actual data type, and the second step is to classify the exact data type.
Bearer supports out-of-the-box 130+ data types, from PD, PII, PHI and Financial data. For each data type, up to a dozen of heuristics are used. Here is below the list of currently supported data types:
Data types
Passwords, PIN, Mother's Maiden Name, Browsing Behavior, Telephone Recordings, Voice Mail, Emails, IP address, Mac address, Device identifier, Browser Fingerprint, Email Address, Physical Address, Telephone Number, Credit Records, Credit Worthiness, Credit Standing, Credit Capacity, Convictions, Charges, Pardons, Age Range, Physical Traits, Income Brackets, Geographic, Biometric Data, Race, National origin, Ethnic Origin, Spoken Languages, Accents, Family Structure, Siblings, Offspring, Marriages, Divorces, Relationships, Credit Card Number, Bank Account, First name, Last name, Full name, Username, Unique Identifier, Passport Number, ID Number, Call Logs, Links clicked, Demeanor, Attitude, Religious Beliefs, Philosophical beliefs, Thoughts, Knowledge, Country, GPS Coordinate, Room Number, Physical and mental health, Drugs test results, Disabilities, Family health history, Personal health history, Health Records, Blood Type, DNA code, Prescriptions, Cars, Houses, Apartments, Personal Possessions, Height, Weight, Age, Hair Color, Skin Tone, Tattoos, Gender, Piercings, Opinions, Intentions, Interests, Favorite Foods, Colors, Likes, Dislikes, Music, Job Titles, Salary, Work History, School attended, Employee Files, Employment History, Evaluations, References, Interviews, Certifications, Disciplinary Actions, Character, General Reputation, Social Status, Martial Status, Religion, Political Affiliation, Interactions, Gender identity, Sexual Preferences, Sexual History, Friends, Connections, Acquaintances, Associations, Group Membership, Purchases, Sales, Credit, Income, Loan Records, Transactions, Taxes, Purchases and Spending Habits, Image, Conversation

How does component detection work?

Bearer detects components processing & storing data through different methods:
  • Open Source package dependencies.
  • External API calls.
  • Internal API calls, including correlating Open API path definitions between repositories.
  • Framework specificities, for e.g. database.yml file in Rails.

How does component classification work?

Bearer classifies open source packages by having built a custom internal database of 20k dependencies. External domains are classified with a custom algorithm revolving around 21 checks.

How do we ensure high-precision?

Static code analysis and classification in general are subjects to false positives. We have, and continue to, spend important efforts to reduce false positives to the minimum possible, essentially optimizing for true positive. Depending on unknown data sets tested, our results in average provide ~93% accuracy.
Besides regularly testing new data sets and improving our heuristics, we added into our CI a data test suite to make sure regressions are not introduced.
Here is what this data test suite looks like:
Language
Data set size
% data types covered
C#
3256
58%
Java
1648
53%
PHP
503
63%
JavaScript
518
63%
TypeScript
334
68%
Go
1648
53%
Python
219
86%
Ruby
419
56%