Constructing a list of HTML sites in our servers

Has anyone found a good way of creating a list of the HTML sites in their reclaim servers? We can get information about anything installed or documented by installatron off their db file, but I want to know who has installed a pure HTML site and what domains or subdomains their serving.

My thought is to take the full list of registered domains and subdomains and run that against installatron’s list of installations to find the domains that aren’t being used by installatron. Then I could run some sort of check against these “empty” domains to see what comes back. Is there a simpler solution that I’m missing?

I was going to suggest

find /home/ -type f -name "index.html"

but it looks like there would be a lot of false positives with a certain analytics package as well as some plugins and other items that throw an index.html in there. But I wonder if you could restrict it to just the document root of each hosted domain/subdomain and cut out much of the noise by not going into subdirectories.