OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

scraping 1a. risk factors from 10K files

  • Thread starter Thread starter patach
  • Start date Start date
P

patach

Guest
I am trying to get 1a. Risk factors section from each 10-K file. I already downloaded files and saved them as txt. file.

Code:
```'/content/drive/My Drive/Colab Notebooks/10/BKR/1.txt'
'/content/drive/My Drive/Colab Notebooks/10/BKR/2.txt'```

As such, folder 10 contains several subfolders(like 10), and each subfolder(like BKR) contains several 10-K as txt file.

I tried below code to get 1a.Risk Factors section, but it failed. I would be happy if you could share your opinions.

Code:
```import re
import os, os.path

PATH = '/content/drive/My Drive/Colab Notebooks/10/BKR'

conclusions = []
for file in os.listdir(path):
    with open(os.path.join(PATH, file)) as f:
        data = f.read()

    conclusion = re.search('1a: (.*?)([A-Z]{2,})', data).group(1)
    conclusions.append(conclusion)```

The error message I got:

Code:
```

---------------------------------------------------------------------------

NotADirectoryError                        Traceback (most recent call last)

<ipython-input-12-051ca10fbeb3> in <module>()
      5 
      6 conclusions = []
----> 7 for file in os.listdir(path):
      8     with open(os.path.join(PATH, file)) as f:
      9         data = f.read()

NotADirectoryError: [Errno 20] Not a directory: '/content/drive/My Drive/Colab Notebooks/10/APA/1.txt

'```
<p>I am trying to get 1a. Risk factors section from each 10-K file. I already downloaded files and saved them as txt. file.</p>

<pre><code>```'/content/drive/My Drive/Colab Notebooks/10/BKR/1.txt'
'/content/drive/My Drive/Colab Notebooks/10/BKR/2.txt'```
</code></pre>

<p>As such, folder 10 contains several subfolders(like 10), and each subfolder(like BKR) contains several 10-K as txt file.</p>

<p>I tried below code to get 1a.Risk Factors section, but it failed. I would be happy if you could share your opinions. </p>

<pre><code>```import re
import os, os.path

PATH = '/content/drive/My Drive/Colab Notebooks/10/BKR'

conclusions = []
for file in os.listdir(path):
with open(os.path.join(PATH, file)) as f:
data = f.read()

conclusion = re.search('1a: (.*?)([A-Z]{2,})', data).group(1)
conclusions.append(conclusion)```
</code></pre>

<p>The error message I got:</p>

<pre><code>```

---------------------------------------------------------------------------

NotADirectoryError Traceback (most recent call last)

<ipython-input-12-051ca10fbeb3> in <module>()
5
6 conclusions = []
----> 7 for file in os.listdir(path):
8 with open(os.path.join(PATH, file)) as f:
9 data = f.read()

NotADirectoryError: [Errno 20] Not a directory: '/content/drive/My Drive/Colab Notebooks/10/APA/1.txt
</code></pre>

<p>'```</p>
 

Latest posts

D
Replies
0
Views
1
Danish Karmally
D
Top