Familiar with working with data in a distributed computing and parallel processing environments using programming models (MapReduce programming).
Eagerness to learn new emerging technologies, like Spark on Apache Hadoop understanding of Yarn framework
Real time computations Storm, Spark streaming
Message broking inbound data architecture Flume, Kafka brokers, RabbitMQ (conceptual understanding should be ok)
SQL with experience on MPP like Greenplum or HAWQ, Asterdata
Cloud deployment experience (AWS) familiarity
Experience performing file management on Hadoop (HDFS)
Experience with PIG