Created by @jmarinotero
The Home Mortgage Disclosure Act (or HMDA, pronounced HUM-duh) is a United States federal law that requires certain financial institutions to provide mortgage data to the public. Congress enacted HMDA in 1975.
- Wikipedia
It looks like this:
2|8800009923|3|8299422144|20170613|1|2|2|1|5|3|4|20170719|NA |NA|NA |NA |2|2|3| | | | |3| | | | |1|2|37|0| | | |NA |2|1
We get 10 - 20 million rows like this every year
case class LoanApplicationRegister(
id: Int,
respondentId: String,
agencyCode: Int,
loan: Loan,
preapprovals: Int,
actionTakenType: Int,
actionTakenDate: Int,
geography: Geography,
applicant: Applicant,
purchaserType: Int,
denial: Denial,
rateSpread: String,
hoepaStatus: Int,
lienStatus: Int
)
This data needs to be parsed from the specified format. In addition to this, every record goes through a series of "edits" or validation checks for data integrity. There are currently 156 edits, around 30 operate on the whole file (need to check all the data, summarize, etc.). The rest (~120) operate on every line.
We wrote a custom Rule Engine that could execute these rules. We wanted it to have some unique properties:
(lar.loan.amount is numeric) and (lar.loan.amount is greaterThan(0))
when(lar.loan.purpose is oneOf(2, 3)) {
lar.preapprovals is equalTo(3)
}
when(lar.rateSpread is numeric) {
when(
(lar.actionTakenType is equalTo(1)) and
(lar.lienStatus is equalTo(1)) and
(lar.rateSpread.toDouble is greaterThan(rateSpread))
) {
lar.hoepaStatus is equalTo(1)
}
}