I am new to data engineering field and currently learning about Hadoop file system and its uses. I want to perform few Hadoop commands from my python script that i could run so that all the hdfs commands get executed in a sequence. The job that i want to perform are:
- copy a file from local to hdfs
- download a file from hdfs to local
- Read various kinds of file such as text,avro,csv and parquet files stored in hdfs.
I want all of these tasks to be performed from a python script and not by typing the respective commands from the terminal. Do help me out and please let me know if some library or module exists with which i can perform this.
Hadoop version is 3.2.1, python version is 3.8.