HMDA

Building a Distributed Platform for Collecting U.S. Mortgage Data

Created by @jmarinotero

Juan Marin Otero

  • Solutions Architect HMDA Ops Team, CFPB
  • Joined Chief November 2014

Hum-What??

Home Mortgage Disclosure Act

The Home Mortgage Disclosure Act (or HMDA, pronounced HUM-duh) is a United States federal law that requires certain financial institutions to provide mortgage data to the public. Congress enacted HMDA in 1975.

- Wikipedia

HMDA

Main objectives

  • Determine whether financial institutions are serving the housing needs of their communities
  • Assist public officials in distributing public-sector investments to attract private investment where it is needed
  • Identify possible discriminatory lending practices

HMDA Today

  • Legacy system
  • Convoluted filing process; paper / fax still used
  • Publication of data takes months (should take hours)

HMDA: The system we are building

  • Large data collection (tens of millions of records)
  • Many entities submit data (several thousand)
  • 30 years of historical data
  • Flexible database schema to accommodate rule changes
  • Three main work streams: data intake, data management and publication

Additionally:

  • Minimize time to market
  • Cost effective (budget)
  • Reactive manifesto: responsive, resilient, elastic and message driven
  • We will log and measure everything
  • Fault tolerant

Some additional challenges

  • Most financial institutions file at the last minute. Large spike in usage
  • Some "time to market" is statutory. Can't fail here
  • We have a budget; Can't scale by throwing money at this
  • No one wants to experience Healthcare.gov again ==> Uptime!

Not a website

HMDA Backend

LAR: Loan Application Register

It looks like this:

2|8800009923|3|8299422144|20170613|1|2|2|1|5|3|4|20170719|NA |NA|NA |NA |2|2|3| | | | |3| | | | |1|2|37|0| | | |NA |2|1


We get 10 - 20 million rows like this every year

HMDA Model


            case class LoanApplicationRegister(
              id: Int,
              respondentId: String,
              agencyCode: Int,
              loan: Loan,
              preapprovals: Int,
              actionTakenType: Int,
              actionTakenDate: Int,
              geography: Geography,
              applicant: Applicant,
              purchaserType: Int,
              denial: Denial,
              rateSpread: String,
              hoepaStatus: Int,
              lienStatus: Int
            ) 
	  

Edit Checks

This data needs to be parsed from the specified format. In addition to this, every record goes through a series of "edits" or validation checks for data integrity. There are currently 156 edits, around 30 operate on the whole file (need to check all the data, summarize, etc.). The rest (~120) operate on every line.

Rule Engine

We wrote a custom Rule Engine that could execute these rules. We wanted it to have some unique properties:

  • Should execute rules in real time
  • Should be able to run in parallel, in a cluster
  • Rules should be easy to write and maintain

HMDA Domain Specific Language


             (lar.loan.amount is numeric) and (lar.loan.amount is greaterThan(0))
          

            when(lar.loan.purpose is oneOf(2, 3)) {
              lar.preapprovals is equalTo(3)
            }
	  

          when(lar.rateSpread is numeric) {
            when(
             (lar.actionTakenType is equalTo(1)) and
             (lar.lienStatus is equalTo(1)) and
             (lar.rateSpread.toDouble is greaterThan(rateSpread))
            ) {
              lar.hoepaStatus is equalTo(1)
            }
          }