When it comes to managing large sets of content/data it's important to think like a reptile. Articulate that jaw to swallow more than you can chew.
Large data sets need to managed with prioritisation to make sure the impactful 20% of content/data that genearates 80% engagement is over optimised while the remaining 80% is triaged to be 'reliable'.
'Reliable can mean having sanity/validity checks and logic programitically drop bad data'
Doing more with less by prioritising management of important subsets of data while doing the 'minumum' quality checks on the remainder can be a succesful stragegy in content and data curation.