If you run Cloud Dataflow pipeline with some your local package, you must create setup.py
and specify --setup-file
option.
Assume that you have a directory structure like below and main.py
depends on under the my-package
packages.
Dataflow
|- my-package/
|- helper.py
|- main.py
main.py
from my-package.helper import Helper
...
If it goes on like above, a pipeline in main.py
cannot run on Cloud Dataflow. So, you have to add setup.py
file to the root directory.
Below is an example of setup.py.
setup.py
import setuptools
REQUIRED_PACKAGES = [
]
setuptools.setup(
name='My pipeline',
version='0.0.1',
description='My dataflow package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
)