Git is free open source software for distributed version control. Git tracks changes for any set of files. With Git every Git directory on every computer is a full-fledged repository with complete history and full version-tracking abilities.
Refer to https://git-scm.com/ for more information.
HPCC Systems has support for the ECLCC Server to compile ECL code directly from Git repositories. The repositories (and optional branches/users) are configured using environment variables on the server. You can submit a query from a repository branch, and the ECLCC Server will pull the source code from a Git repository and compile it. This allows you to deploy a specific version of a query without needing to perform any work on the client.
Starting with version 8.4, the platform code for Git support significantly improved. Some of these improvements have been backported to older support releases such, as 7.12. However, You still need to update to a recent point release to ensure you get any of these improvements. While the later releases such as 8.6 will include all of these improvements.
The platform code has been upgraded for significant improvements to the speed. Featuring faster compiling from Git repositories without the added overhead when compared with compiling from checked out sources.
The HPCC Systems platform now supports Git manifests and resources when compiling.
Git-lfs is an extension to Git that improves support for large files and is supported by both GitHub and GitLab. This extension is particularly useful for large resources. For example, if you have java packages included as part of the manifest.
The HPCC Systems platform code includes support for using multiple Git repositories. With this multiple repository support the HPCC Systems platform now allows each Git repository to be treated as a separate independent package. Dependencies between the repositories are specified in a package file which is checked into the repository and versioned along with the ECL code. The package file indicates what the dependencies are and which versions should be used.
This approach resolves concerns such as when merging changes from multiple sources into a single repository. In that context it solves issues with incompatible changes, dependencies, or clashes if there are modules with the same name and ensures that the dependencies between repositories are versioned.
The --main syntax has been extended to allow compiling directly from the repository.
Consider the following command :
ecl run thor --main demo.main@https://github.com/gituser/gch-demo-d#version1 --server=...
This command submits a query to Thor via ESP. It retrieves ECL code from the 'version1' branch in the https://hithub.com/gituser/gch-demo-d repository. Compiles the code in the demo/main.ecl file and then runs the query on Thor. The checkout will be done on the remote ECLCC Server rather than on the client machine.
The syntax for the reference to the repository is as follows:
<protocol:>//<urn>/<user>/<repository>#version
The protocol and urn can be omitted and a default will be used. Such as in the following example:
ecl run thor --main demo.main@gituser/gch-ecldemo-d#version1 --server=...
This command also submits a query to Thor, retrieves ECL code from the 'version1' branch in the gch-demo-d repository. Compiles the code in the demo.main.ecl file and then runs the query on Thor.
The version text that follows the hash (#) in the repository reference can take any of the following forms:
The name of a branch
The name of a tag
Note: Currently only lightweight tags are supported. Annotated tags are not yet supported.
The secure hash algorithm (SHA) of a commit
To illustrate consider the following commands:
ecl run thor --main demo.main@gituser/gch-ecldemo-d#version1 --server=...
This command will retrieve the demo.main ECL code from the 'version1' branch of the gch-ecldemo-d repository.
ecl run thor --main demo.main@gituser/gch-ecldemo-d#3c23ca0 --server=...
This command will retrieve the demo.main ECL code from the commit with the SHA of '3c23ca0'.
You can also specify the name of a tag utilizing this same syntax.
You can use the --syntax option to check the syntax of your code.
The following command checks the syntax of the code in the commit with the SHA of '3c23ca0' of the gch-ecldemo-d repository.
ecl run thor --main demo.main@ghalliday/gch-ecldemo-d#3c23ca0 --syntax
While the following command would check the syntax of the code in the 'version1' branch of the gch-ecldemo-d repository.
ecl run thor --main demo.main@ghalliday/gch-ecldemo-d#version1 --syntax
Since the code in a branch could possibly get updated and change - it is a good idea to always check the syntax.
Consider this package.json file:
{ "name": "demoRepoC", "version": "1.0.0", "dependencies": { "demoRepoD": "gituser/gch-ecldemo-d#version1" } }
The package file gives a name to the package and defines the dependencies. The dependencies property is a list of key-value pairs. The key (demoRepoD) provides the name of the ECL module that is used to access the external repository. The value is a repository reference which uses the same format as the previous examples using the --main syntax.
To use the external repository in your ECL code you need to add an import definition.
IMPORT layout; IMPORT demoRepoD AS demoD; EXPORT personAsText(layout.person input) := input.name + ': ‘ + demoD.format.maskPassword(input.password);
The above example the name demoRepoD in the second IMPORT matches the key value in the package.json file. This code uses the attribute format.maskPassword from the version1 branch from the gituser/gch-ecldemo-d.
Each package is processed independently of any others. The only connection is through explicit imports of the external packages. This is why packages can have modules or attributes with the same name and they will not clash.
The following is an example of a package.json file using multiple repositories.
IMPORT layout; IMPORT demoRepoD_V1 AS demo1; IMPORT demoRepoD_V2 AS demo2; EXPORT personAsText(layout.person input) := 'Was: ' + demo1.format.maskPassword(input.password) + ' Now: ' + demo2.format.maskPassword(input.password);
Note that the demoRepoD repository _V1 and _V2 are processed independently.
Likewise consider the following example using Query ECL
{ "name": "demoRepoC", "version": "1.0.0", "dependencies": { "demoRepoD_V1": "gituser/gch-ecldemo-d#version1" "demoRepoD_V2": "gituser/gch-ecldemo-d#version2" } }
Noting the dependencies of the branches 'version1' and 'version2' of the gch-ecldemo-d repository.
Command line options have been added to the ECL and ECLCC commands to leverage these improvements in working with Git repositories.
The -R option has been added to the eclcc and ecl commands. Set the -R option instruct the compiler to use source from a local directory instead of using source from an external repository.
Syntax:
-R<repo>[#version]=path
For example:
ecl run examples/main.ecl -Rgituser/gch-ecldemo-d=/home/myuser/source/demod
This command uses the ECL code for DemoRepoD from /home/myuser/source/demoD rather than https://github.com/gituser/gch-ecldemo-d#version1.
The -v option has been improved to provide more verbose output including the details of the Git requests.
You could use the -v option for debugging. For instance, if you have any issues of repositories not resolving. Issue the command as follows with the -v option to analyse the details of the Git requests.
ecl run examples/main.ecl -v -Rgituser/gch-ecldemo-d=/home/myuser/source/demod
These command line options have been added to the ECL and ECLCC commands.
--defaultgitprefix This command line option changes the default prefix that is added to relative packages references. The default can also be configured using the environment variable ECLCC_DEFAULT_GITPREFIX. Otherwise It defaults to "https://github.com/".
--fetchrepos Setting this option tells whether external repositories that have not been cloned locally should be fetched. This defaults to true in 8.6.x. It may be useful to set this option to false if all external repositories are mapped to local directories to verify if they are being redirected correctly.
--updaterepos Updates external repositories that have previously been fetched locally. This option defaults to true. It is useful to set this option to false if you are working in a situation with no access to the external repositories, or to avoid the overhead of checking for changes if you know there aren't any.
ECLCC_ECLREPO_PATH The directory the external repositories are cloned to. On a client machine this defaults to: <home>/.HPCCSystems/repos (or %APPDATA%\HPCCSystems\repos on windows). You can delete the contents of this directory to force a clean download of all repositories.
These are Helm chart options for configuring Git values for cloud deployments. The following values are now supported for configuring the use of Git within Helm charts for HPCC Systems cloud deployments.
eclccserver.gitUsername - Provides the Git user name
secrets.git - Define the secrets.git to allow repositories to be shared between queries, to be able to cache and share the cloned packages between instances.
eclccserver.gitPlane - This options defines the storage plane that external packages are checked out and cloned to.
For example
eclccserver: - name: myeclccserver #... - gitPlane: git/sample/storage
If the gitPlane option is not supplied, the default is the first storage plane with a category of Git - otherwise ECLCC Server uses the first storage plane with a category of dll.
If external repositories are public, such as bundles, then there are no further requirements. Private repositories have the additional complication of requiring authentication information - either on the client or on the ECLCC Server depending on where the source is gathered. Git provides various methods for providing these credentials.
These are the recommended approaches for configuring the credentials on a local system that is interacting with a remote GitHub.
github authentication Download the GitHub command line toolkit. You can then use it to authenticate all Git access with the following command:
gh auth login
This is probably your best option if you are using GitHub. More details can be found on:
ssh key In this scenario, the ssh key associated with a local developers machine is registered with the GitHub account. This is used when the GitHub reference is of the form of ssh://github.com.
The sshkey can be protected with a passcode and there are various options to avoid having to enter the passcode each time. For more information see:
https://docs.github.com/en/authentication/connecting-to-github-with-ssh/about-ssh
Use a personal access token These are similar to a password, but with additional restrictions on their lifetime and the resources that can be accessed. Here are the details on how to to create them. They can then be used with the various git credential caching options.
An example can be found here:
Generally, for authentication it is preferrable to use the https:// protocol instead of the ssh:// protocol for links in package-lock.json files. If the ssh:// is used it requires any machine that processes the dependency to have access to a registered ssh key. That can sometimes cause avoidable issues.
All of these Authentication options are likely to involve some user interaction, such as passphrases for ssh keys, web interaction with GitHub authentication, and initial entry for cached access tokens. This is problematic for the ECLCC Server which cannot support user interaction, and since it is preferrable not to pass credentials around. The solution therefore is to use a personal access token securely stored as a secret. This token could then be associated with a special service account, which would then securely initiate these transactions. The secret then avoids the need to pass credentials and allows the keys to be rotated.
This section describes secrets support in the Kubernetes (and bare metal) versions of the HPCC Systems platform.
To add secrets support:
. Add the gitUsername property to the eclccserver component of your customization yaml file:
eclccserver:
- name: myeclccserver
gitUsername: gituser
Note: the eclccserver.gitUsername value should match your git user name.
Add a secret to the customization yaml file, with a key that matches the gitUsername
secrets:
git:
gituser: my-git-secret
Add the secret to Kubernetes containing the personal access token:
apiVersion: v1
kind: Secret
metadata:
name: my-git-secret
type: Opaque
stringData:
password: ghp_eZLHeuoHxxxxxxxxxxxxxxxxxxxxol3986sS=
Note password contains the personal access token.
Apply the secret to your Kubernetes using the kubectl command:
kubectl apply -f ~/dev/hpcc/helm/secrets/my-git-secret
When a query is submitted to the ECLCC Server, any git repositories are then accessed using this configured user name and password.
Store the secret in a vault. You can also store the PAT (personal access token) inside a vault.
This section describes credentials for bare metal systems. Bare metal systems require some similar configuration steps.
Add the gitUsername property to the EclCCServerProcess entry in the environment.xml file.
<EclCCServerProcess daliServers="mydali" ... gitUsername="gitguser“
Push out the environment.xml to all nodes.
Either store the credentials as secrets or store in a vault.
As secrets:
Store the access token in:
/opt/HPCCSystems/secrets/git/<user-name>/password
For example:
cat /opt/HPCCSystems/secrets/git/gitusr/password ghp_eZLHeuoHxxxxxxxxxxxxxxxxxxxxol3986sS=
Or for a vault:
You can store inside a vault. You can now define a vault within the Software section of the environment. For example:
<Environment> <Software> ... <vaults> <git name='my-storage-vault' url="http://127.0.0.1:8200/v1/secret/data/git/${secret}" kind="kv-v2" client-secret="myVaultSecret"/> ... </vaults> ...
Note that the above entries have the same exact content as the corresponding entries in the kubernetes values.yaml file.