OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Dockering a repo with updated submodules

  • Thread starter Thread starter fancybear
  • Start date Start date
F

fancybear

Guest
I have a repo dq-mod in the following setup,

Code:
|-main.py
|-common_utils/
|-custom_utils/
|-dockerfile
|-start.sh

The common_utils is a submodule which contains some common util files. The custom_utils folder contains utils that are used only by main.py. Some of the repo contents are packaged as a docker image.

Since common_utils is a submodule I can't package its contents into the docker image.

Requirements:

  1. Package the contents of the common_utils during the docker image build
  2. Whenever the docker image is used, use the latest contents from the common_utils

Dockerfile

Code:
# Debian 11 is recommended.
FROM --platform=linux/amd64 debian:11-slim

# Get environment argument from docker build command.
ARG ENV_ARG
# Set environment variable.
ENV ENV=$ENV_ARG

# Suppress interactive prompts
ENV DEBIAN_FRONTEND=noninteractive

# (Required) Install utilities required by Spark scripts.
RUN apt update && apt install -y procps tini libjemalloc2 git

# Enable jemalloc2 as default memory allocator
ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2

# (Optional) Install and configure Miniconda3.
ENV CONDA_HOME=/opt/miniconda3
ENV PYSPARK_PYTHON=${CONDA_HOME}/bin/python
ENV PATH=${CONDA_HOME}/bin:${PATH}
COPY build_dependencies/Miniconda3-py39_23.1.0-1-Linux-x86_64.sh .
RUN bash Miniconda3-py39_23.1.0-1-Linux-x86_64.sh -b -p /opt/miniconda3 \
  && ${CONDA_HOME}/bin/conda config --system --set always_yes True \
  && ${CONDA_HOME}/bin/conda config --system --set auto_update_conda False \
  && ${CONDA_HOME}/bin/conda config --system --prepend channels conda-forge \
  && ${CONDA_HOME}/bin/conda config --system --set channel_priority strict

# (Optional) Install Conda packages.
#
# The following packages are installed in the default image, it is strongly
# recommended to include all of them.
#
# Use mamba to install packages quickly.
RUN ${CONDA_HOME}/bin/conda install mamba -n base -c conda-forge \
    && ${CONDA_HOME}/bin/mamba install \
      conda \
      cython \
      fastavro \
      fastparquet \
      gcsfs \
      google-cloud-bigquery-storage \
      google-cloud-bigquery[pandas] \
      google-cloud-bigtable \
      google-cloud-container \
      google-cloud-datacatalog \
      google-cloud-dataproc \
      google-cloud-datastore \
      google-cloud-language \
      google-cloud-logging \
      google-cloud-monitoring \
      google-cloud-pubsub \
      google-cloud-redis \
      google-cloud-spanner \
      google-cloud-speech \
      google-cloud-storage \
      google-cloud-texttospeech \
      google-cloud-translate \
      google-cloud-vision \
      koalas \
      matplotlib \
      nltk \
      numba \
      numpy \
      openblas \
      orc \
      pandas \
      pyarrow \
      pysal \
      pytables \
      python \
      regex \
      requests \
      rtree \
      scikit-image \
      scikit-learn \
      scipy \
      seaborn \
      sqlalchemy \
      sympy \
      virtualenv \
      # (Optional) Install additional packages.
      openpyxl

RUN ${CONDA_HOME}/bin/mamba install -n base -c conda-forge great-expectations=0.17.22

# (Optional) Add extra Python modules.
ENV PYTHONPATH=/opt/python/packages
RUN mkdir -p "${PYTHONPATH}"
COPY custom_utils "${PYTHONPATH}/utils"

# Copy the start.sh script from the root directory of the project to /etc
COPY start.sh /etc/start.sh

# Make the start.sh script executable
RUN chmod +x /etc/start.sh

# (Required) Create the 'spark' group/user.
# The GID and UID must be 1099. Home directory is required.
RUN groupadd -g 1099 spark
RUN useradd -u 1099 -g 1099 -d /home/spark -m spark
USER spark

# Set the entrypoint to the start.sh script
ENTRYPOINT ["/etc/start.sh"]

start.sh

Code:
#!/bin/bash

# Clone the submodule repository and copy its contents to the utils directory
git config --global --add safe.directory /opt/python/packages/utils
cd /opt/python/packages/utils
git clone --recurse-submodules -j8 https://github.com/org/common-utils.git common_utils
git submodule update --init --recursive
cp -r common_utils/* .
ls -R .

# Remove the cloned repository
rm -rf common_utils

# Execute any additional commands or the main application
exec "$@"

I try the above scripts but the common_utils contents are never copied to the utils folder.
<p>I have a repo <code>dq-mod</code> in the following setup,</p>
<pre><code>|-main.py
|-common_utils/
|-custom_utils/
|-dockerfile
|-start.sh
</code></pre>
<p>The <code>common_utils</code> is a submodule which contains some common util files. The <code>custom_utils</code> folder contains utils that are used only by <code>main.py</code>. Some of the repo contents are packaged as a docker image.</p>
<p>Since <code>common_utils</code> is a submodule I can't package its contents into the docker image.</p>
<p>Requirements:</p>
<ol>
<li>Package the contents of the <code>common_utils</code> during the docker image build</li>
<li>Whenever the docker image is used, use the latest contents from the <code>common_utils</code></li>
</ol>
<p>Dockerfile</p>
<pre><code># Debian 11 is recommended.
FROM --platform=linux/amd64 debian:11-slim

# Get environment argument from docker build command.
ARG ENV_ARG
# Set environment variable.
ENV ENV=$ENV_ARG

# Suppress interactive prompts
ENV DEBIAN_FRONTEND=noninteractive

# (Required) Install utilities required by Spark scripts.
RUN apt update && apt install -y procps tini libjemalloc2 git

# Enable jemalloc2 as default memory allocator
ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2

# (Optional) Install and configure Miniconda3.
ENV CONDA_HOME=/opt/miniconda3
ENV PYSPARK_PYTHON=${CONDA_HOME}/bin/python
ENV PATH=${CONDA_HOME}/bin:${PATH}
COPY build_dependencies/Miniconda3-py39_23.1.0-1-Linux-x86_64.sh .
RUN bash Miniconda3-py39_23.1.0-1-Linux-x86_64.sh -b -p /opt/miniconda3 \
&& ${CONDA_HOME}/bin/conda config --system --set always_yes True \
&& ${CONDA_HOME}/bin/conda config --system --set auto_update_conda False \
&& ${CONDA_HOME}/bin/conda config --system --prepend channels conda-forge \
&& ${CONDA_HOME}/bin/conda config --system --set channel_priority strict

# (Optional) Install Conda packages.
#
# The following packages are installed in the default image, it is strongly
# recommended to include all of them.
#
# Use mamba to install packages quickly.
RUN ${CONDA_HOME}/bin/conda install mamba -n base -c conda-forge \
&& ${CONDA_HOME}/bin/mamba install \
conda \
cython \
fastavro \
fastparquet \
gcsfs \
google-cloud-bigquery-storage \
google-cloud-bigquery[pandas] \
google-cloud-bigtable \
google-cloud-container \
google-cloud-datacatalog \
google-cloud-dataproc \
google-cloud-datastore \
google-cloud-language \
google-cloud-logging \
google-cloud-monitoring \
google-cloud-pubsub \
google-cloud-redis \
google-cloud-spanner \
google-cloud-speech \
google-cloud-storage \
google-cloud-texttospeech \
google-cloud-translate \
google-cloud-vision \
koalas \
matplotlib \
nltk \
numba \
numpy \
openblas \
orc \
pandas \
pyarrow \
pysal \
pytables \
python \
regex \
requests \
rtree \
scikit-image \
scikit-learn \
scipy \
seaborn \
sqlalchemy \
sympy \
virtualenv \
# (Optional) Install additional packages.
openpyxl

RUN ${CONDA_HOME}/bin/mamba install -n base -c conda-forge great-expectations=0.17.22

# (Optional) Add extra Python modules.
ENV PYTHONPATH=/opt/python/packages
RUN mkdir -p "${PYTHONPATH}"
COPY custom_utils "${PYTHONPATH}/utils"

# Copy the start.sh script from the root directory of the project to /etc
COPY start.sh /etc/start.sh

# Make the start.sh script executable
RUN chmod +x /etc/start.sh

# (Required) Create the 'spark' group/user.
# The GID and UID must be 1099. Home directory is required.
RUN groupadd -g 1099 spark
RUN useradd -u 1099 -g 1099 -d /home/spark -m spark
USER spark

# Set the entrypoint to the start.sh script
ENTRYPOINT ["/etc/start.sh"]
</code></pre>
<p>start.sh</p>
<pre><code>#!/bin/bash

# Clone the submodule repository and copy its contents to the utils directory
git config --global --add safe.directory /opt/python/packages/utils
cd /opt/python/packages/utils
git clone --recurse-submodules -j8 https://github.com/org/common-utils.git common_utils
git submodule update --init --recursive
cp -r common_utils/* .
ls -R .

# Remove the cloned repository
rm -rf common_utils

# Execute any additional commands or the main application
exec "$@"
</code></pre>
<p>I try the above scripts but the <code>common_utils</code> contents are never copied to the <code>utils</code> folder.</p>
 
Top