OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Unable to read text file in Glue job

  • Thread starter Thread starter RushHour
  • Start date Start date
R

RushHour

Guest
I am trying to read the schema from a text file under the same package as the code but cannot read that file using the AWS glue job. I will use that schema for creating a dataframe using Pyspark. I can load that file locally. I am zipping the code files as .zip, placing them under the s3 bucket, and then referencing them in the glue job. Every other thing works fine. No problem there. But when I try the below code it doesn't work.

Code:
file_path = os.path.join(Path(os.path.dirname(os.path.relpath(__file__))), "verifications.txt")
multiline_data = None
with open(file_path, 'r') as data_file:
   multiline_data = data_file.read()
self.logger.info(f"Schema is {multiline_data}")

This code throws the below error:

Code:
Error Category: UNCLASSIFIED_ERROR; NotADirectoryError: [Errno 20] Not a directory: 'src.zip/src/ingestion/jobs/verifications.txt'

I also tried with abs_path but it didn't help either. The same block of code works fine locally.

I also tried directly passing the "./verifications.txt" path but no luck.

So how do I read this file?
<p>I am trying to read the <code>schema</code> from a <code>text</code> file under the same package as the code but cannot read that file using the <strong>AWS glue job</strong>. I will use that <code>schema</code> for creating a dataframe using <code>Pyspark</code>. I can load that file locally. I am zipping the code files as .zip, placing them under the <code>s3</code> bucket, and then referencing them in the glue job. Every other thing works fine. No problem there. But when I try the below code it doesn't work.</p>
<pre><code>file_path = os.path.join(Path(os.path.dirname(os.path.relpath(__file__))), "verifications.txt")
multiline_data = None
with open(file_path, 'r') as data_file:
multiline_data = data_file.read()
self.logger.info(f"Schema is {multiline_data}")

</code></pre>
<p>This code throws the below <strong>error</strong>:</p>
<pre><code>Error Category: UNCLASSIFIED_ERROR; NotADirectoryError: [Errno 20] Not a directory: 'src.zip/src/ingestion/jobs/verifications.txt'
</code></pre>
<p>I also tried with <code>abs_path</code> but it didn't help either. The same block of code works fine locally.</p>
<p>I also tried directly passing the <code>"./verifications.txt"</code> path but no luck.</p>
<p>So how do I read this file?</p>
Continue reading...
 

Latest posts

Top