codingecho

日々の体験などを書いてます

Run Apache Beam process with local dependency

If you run Cloud Dataflow pipeline with some your local package, you must create setup.py and specify --setup-file option.

Assume that you have a directory structure like below and main.py depends on under the my-package packages.

Dataflow
  |- my-package/
       |- helper.py
  |- main.py

main.py

from my-package.helper import Helper
...

If it goes on like above, a pipeline in main.py cannot run on Cloud Dataflow. So, you have to add setup.py file to the root directory.

Below is an example of setup.py.

setup.py

import setuptools

REQUIRED_PACKAGES = [
]

setuptools.setup(
    name='My pipeline',
    version='0.0.1',
    description='My dataflow package.',
    install_requires=REQUIRED_PACKAGES,
    packages=setuptools.find_packages(),
)

Reference: