Tuesday, April 21, 2015

Millions of accounts are being compromised because developers don't have a specialised user database

One of the reasons that hacking incidents are so bad is because user data is stored in a normal database of some form - SQL, NoSQL, doesn't really matter which one, they are all unsuitable. Typically the user data sits in a table right alongside the rest of the application data. If a hacker gets access to the database machine or to the database query API then the hacker has unlimited access to download user data.  


This results in cases in which millions of user accounts are compromised. Another problem is that developers roll their own user and password management systems and get things like salting and hashing wrong, making the data vulnerable.

What developers need  is a minimal, single purpose database specifically designed for protecting user information and designed to move user data access away from the rest of the application data to minimise the impact of access by hackers.

Here are the requirements:
  • It should be accessible only via its specialised API which is designed to constrain the ways that it is accessed.
  • It should not provide generalised database query functionality. 
  • Its API should have password salting and hashing built in.
  • Its API should throttle access with some sort of algorithm designed to prevent downloads of large quantities of user data.
  • It should encrypt data internally.
  • It should communicate only over encrypted connections.
  • It should be distributed.
  • It should not be run on any web server, should run "behind the scenes" and be accessible only via its API.
  • It should include triggers and alerts based on uncommon access patterns or recognised nefarious access patterns.
  • It should have no other purpose.

Something like this wouldn't be a guarantee against being hacked but would be a good baseline for preventing common problems and minimising the outcomes of the seemingly inevitable hacks that we hear about all the time. Someone clever should write this.  I'd use it.

I bet there's someone out there smart enough to put this together in a matter of hours.


  1. Welcome to 1996, allow me to introduce you to this new-fangled thing called LDAP...

    meanwhile: "Its API should have password salting and hashing built in." - no, the *API* should not. The internal storage and password validation mechanism should of course use them, but they should not be visible in the API nor to end-users. Everything about password storage should be opaque at the API/user level, so that the internals are free to be updated to stronger mechanisms over time.

    Aside from that, yeah - OpenLDAP does all that. Has been doing it for over a decade.

    1. Why do you think OpenLDAP isn't used as the solution for this problem?

    2. Because "not invented here" is incredibly common in the industry. Many people often cut their teeth in basic LAMP stacks with no experience In the real world. So they enter the market, building these complicated fancy web apps with 0 real world experience.

    3. f**** you got me. Exactly the answer I wanted to provide.

      Except most developers don't understant the 2 binds mechanism :
      - an anonymous for finding the entry (based on restricted access on the realm)
      - a auth'd one on the auth data (profile).

      ldap being a network protocol it totally supports using a SQL backend (or whatever backend you want).

      But as my fellow developers says: it is too old and to complex.

    4. It's not only too old and too complex, it also /does not scale/ without herculean efforts to ensure database sanity across replicas. OpenLDAP is a poster child for this -- it's HDB backend is designed to handle replication, but is just likely to corrupt itself as it is to actually replicate cleanly.

    5. Active Directory can scale to literally millions of objects.

  2. You do realize that most of what you said, can be accomplished by using any kind of database? just implement the details your self. have a login server that is not connected directly to the internet, use VPN to access specialized interface that able to check the login details.

    one problem arises when you separate the data from user information which means no foreign key constraints. ie no "data consistency".

    1. One of the requirements was "no generalised query API". The whole point is not to use a database that can be queried in any normal way.

    2. i do understand what you are saying, but if you have restricted access to specific interface it can only be accessed via that interface and not in any other way.

  3. LDAP can be painful to work with, even when using such "industry standards" like Active Directory

    I think a specialised server (or even Postgres extension) could be useful but couldn't you set up the following?

    Create a stored proc or similar that does this: SELECT COUNT(PasswordHash) WHERE Username = @Username AND PasswordHash = @PasswordHash

    Create a user account on the database that only has EXEC permissions, not SELECT or anything else

    Restrict the user account to only be able to log in from the app servers (or whatever)

    Restrict the DB admin to only login via a jumphost.

    1. it can be done using mircoservices in front of the login servers. which are not accessible to the public network.

    2. AD/LDAP authentication is literally a simple bind with the user's credential info submitted to the LDAP server over TLS. You'll get a success/fail back from the AD server. I'm not really sure how this is difficult to work with?

      When using TLS, you can even allow users to update their password via your application: https://support.microsoft.com/en-us/kb/269190/

      This isn't difficult, and nothing new to the industry.

  4. Use a proper identity provider instead of your own DB like Azure Active Directory